ruby_parser
Advanced tools
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
@@ -177,3 +177,3 @@ #!/usr/bin/env ruby -w | ||
| when /^$/ then | ||
| when /^(\d+) (\$?\w+): (.*)/ then # yacc | ||
| when /^(\d+) (\$?[@\w]+): (.*)/ then # yacc | ||
| rule = $2 | ||
@@ -203,3 +203,3 @@ order << rule unless rules.has_key? rule | ||
| else | ||
| warn "unparsed: #{$.}: #{line.chomp}" | ||
| warn "unparsed: #{$.}: #{line.strip.inspect}" | ||
| end | ||
@@ -206,0 +206,0 @@ end |
+133
-0
@@ -58,1 +58,134 @@ # Quick Notes to Help with Debugging | ||
| changes but I don't have that set up at this point. | ||
| ## Adding New Grammar Productions | ||
| Ruby adds stuff to the parser ALL THE TIME. It's actually hard to keep | ||
| up with, but I've added some tools and shown what a typical workflow | ||
| looks like. Let's say you want to add ruby 2.7's "beginless range" (eg | ||
| `..42`). | ||
| Whenever there's a language feature missing, I start with comparing | ||
| the parse trees between MRI and RP: | ||
| ### Structural Comparing | ||
| There's a bunch of rake tasks `compare27`, `compare26`, etc that try | ||
| to normalize and diff MRI's parse.y parse tree (just the structure of | ||
| the tree in yacc) to ruby\_parser's parse tree (racc). It's the first | ||
| thing I do when I'm adding a new version. Stub out all the version | ||
| differences, and then start to diff the structure and move | ||
| ruby\_parser towards the new changes. | ||
| Some differences are just gonna be there... but here's an example of a | ||
| real diff between MRI 2.7 and ruby_parser as of today: | ||
| ```diff | ||
| arg tDOT3 arg | ||
| arg tDOT2 | ||
| arg tDOT3 | ||
| - tBDOT2 arg | ||
| - tBDOT3 arg | ||
| arg tPLUS arg | ||
| arg tMINUS arg | ||
| arg tSTAR2 arg | ||
| ``` | ||
| This is a new language feature that ruby_parser doesn't handle yet. | ||
| It's in MRI (the left hand side of the diff) but not ruby\_parser (the | ||
| right hand side) so it is a `-` or missing line. | ||
| Some other diffs will have both `+` and `-` lines. That usually | ||
| happens when MRI has been refactoring the grammar. Sometimes I choose | ||
| to adapt those refactorings and sometimes it starts to get too | ||
| difficult to maintain multiple versions of ruby parsing in a single | ||
| file. | ||
| But! This structural comparing is always a place you should look when | ||
| ruby_parser is failing to parse something. Maybe it just hasn't been | ||
| implemented yet and the easiest place to look is the diff. | ||
| ### Starting Test First | ||
| The next thing I do is to add a parser test to cover that feature. I | ||
| usually start with the parser and work backwards towards the lexer as | ||
| needed, as I find it structures things properly and keeps things goal | ||
| oriented. | ||
| So, make a new parser test, usually in the versioned section of the | ||
| parser tests. | ||
| ``` | ||
| def test_beginless2 | ||
| rb = "..10\n; ..a\n; c" | ||
| pt = s(:block, | ||
| s(:dot2, nil, s(:lit, 0).line(1)).line(1), | ||
| s(:dot2, nil, s(:call, nil, :a).line(2)).line(2), | ||
| s(:call, nil, :c).line(3)).line(1) | ||
| assert_parse_line rb, pt, 1 | ||
| flunk "not done yet" | ||
| end | ||
| ``` | ||
| (In this case copied and modified the tests for open ranges from 2.6) | ||
| and run it to get my first error: | ||
| ``` | ||
| % rake N=/beginless/ | ||
| ... | ||
| E | ||
| Finished in 0.021814s, 45.8421 runs/s, 0.0000 assertions/s. | ||
| 1) Error: | ||
| TestRubyParserV27#test_whatevs: | ||
| Racc::ParseError: (string):1 :: parse error on value ".." (tDOT2) | ||
| GEMS/2.7.0/gems/racc-1.5.0/lib/racc/parser.rb:538:in `on_error' | ||
| WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1304:in `on_error' | ||
| (eval):3:in `_racc_do_parse_c' | ||
| (eval):3:in `do_parse' | ||
| WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1329:in `block in process' | ||
| RUBY/lib/ruby/2.7.0/timeout.rb:95:in `block in timeout' | ||
| RUBY/lib/ruby/2.7.0/timeout.rb:33:in `block in catch' | ||
| RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch' | ||
| RUBY/lib/ruby/2.7.0/timeout.rb:33:in `catch' | ||
| RUBY/lib/ruby/2.7.0/timeout.rb:110:in `timeout' | ||
| WORK/ruby_parser/dev/lib/ruby_parser_extras.rb:1317:in `process' | ||
| WORK/ruby_parser/dev/test/test_ruby_parser.rb:4198:in `assert_parse' | ||
| WORK/ruby_parser/dev/test/test_ruby_parser.rb:4221:in `assert_parse_line' | ||
| WORK/ruby_parser/dev/test/test_ruby_parser.rb:4451:in `test_whatevs' | ||
| ``` | ||
| For starters, we know the missing production is for `tBDOT2 arg`. It | ||
| is currently blowing up because it is getting `tDOT2` and simply | ||
| doesn't know what to do with it, so it raises the error. As the diff | ||
| suggests, that's the wrong token to begin with, so it is probably time | ||
| to also create a lexer test: | ||
| ``` | ||
| def test_yylex_bdot2 | ||
| assert_lex3("..42", | ||
| s(:dot2, nil, s(:lit, 42)), | ||
| :tBDOT2, "..", EXPR_BEG, | ||
| :tINTEGER, "42", EXPR_NUM) | ||
| flunk "not done yet" | ||
| end | ||
| ``` | ||
| This one is mostly speculative at this point. It says "if we're lexing | ||
| this string, we should get this sexp if we fully parse it, and the | ||
| lexical stream should look like this"... That last bit is mostly made | ||
| up at this point. Sometimes I don't know exactly what expression state | ||
| things should be in until I start really digging in. | ||
| At this point, I have 2 failing tests that are directing me in the | ||
| right direction. It's now a matter of digging through | ||
| `compare/parse26.y` to see how the lexer differs and implementing | ||
| it... | ||
| But this is a good start to the doco for now. I'll add more later. |
+19
-0
@@ -0,1 +1,20 @@ | ||
| === 3.16.0 / 2021-05-15 | ||
| * 1 major enhancement: | ||
| * Added tentative 3.0 support. | ||
| * 3 minor enhancements: | ||
| * Added lexing for "beginless range" (bdots). | ||
| * Added parsing for bdots. | ||
| * Updated rake compare task to download xz files, bumped versions, etc | ||
| * 4 bug fixes: | ||
| * Bump rake dependency to >= 10, < 15. (presidentbeef) | ||
| * Bump sexp_processor dependency to 4.15.1+. (pravi) | ||
| * Fixed minor state mismatch at the end of parsing to make diffing a little cleaner. | ||
| * Fixed normalizer to deal with new bison token syntax | ||
| === 3.15.1 / 2021-01-10 | ||
@@ -2,0 +21,0 @@ |
+19
-0
@@ -28,2 +28,7 @@ # frozen_string_literal: true | ||
| BTOKENS = { | ||
| ".." => :tBDOT2, | ||
| "..." => :tBDOT3, | ||
| } | ||
| TOKENS = { | ||
@@ -135,2 +140,6 @@ "!" => :tBANG, | ||
| def expr_beg? | ||
| lex_state =~ EXPR_BEG | ||
| end | ||
| def expr_dot? | ||
@@ -585,2 +594,8 @@ lex_state =~ EXPR_DOT | ||
| def process_dots text | ||
| tokens = ruby27plus? && expr_beg? ? BTOKENS : TOKENS | ||
| result EXPR_BEG, tokens[text], text | ||
| end | ||
| def process_float text | ||
@@ -1142,2 +1157,6 @@ rb_compile_error "Invalid numeric format" if text =~ /__/ | ||
| def ruby27plus? | ||
| parser.class.version >= 27 | ||
| end | ||
| def scan re | ||
@@ -1144,0 +1163,0 @@ ss.scan re |
@@ -141,3 +141,3 @@ # encoding: UTF-8 | ||
| when text = ss.scan(/\.\.\.?/) then | ||
| action { result EXPR_BEG, TOKENS[text], text } | ||
| process_dots text | ||
| when ss.skip(/\.\d/) then | ||
@@ -144,0 +144,0 @@ action { rb_compile_error "no .<digit> floating literal anymore put 0 before dot" } |
@@ -32,3 +32,3 @@ # encoding: ASCII-8BIT | ||
| module RubyParserStuff | ||
| VERSION = "3.15.1" | ||
| VERSION = "3.16.0" | ||
@@ -119,3 +119,3 @@ attr_accessor :lexer, :in_def, :in_single, :file | ||
| v = self.class.name[/2\d/] | ||
| v = self.class.name[/[23]\d/] | ||
| raise "Bad Class name #{self.class}" unless v | ||
@@ -122,0 +122,0 @@ |
@@ -82,2 +82,3 @@ require "ruby_parser_extras" | ||
| require "ruby27_parser" | ||
| require "ruby30_parser" | ||
@@ -87,2 +88,3 @@ class RubyParser # HACK | ||
| class V30 < ::Ruby30Parser; end | ||
| class V27 < ::Ruby27Parser; end | ||
@@ -89,0 +91,0 @@ class V26 < ::Ruby26Parser; end |
+2
-0
@@ -29,2 +29,4 @@ .autotest | ||
| lib/ruby27_parser.y | ||
| lib/ruby30_parser.rb | ||
| lib/ruby30_parser.y | ||
| lib/ruby_lexer.rb | ||
@@ -31,0 +33,0 @@ lib/ruby_lexer.rex |
+28
-12
@@ -16,3 +16,3 @@ # -*- ruby -*- | ||
| V2 = %w[20 21 22 23 24 25 26 27] | ||
| V2 = %w[20 21 22 23 24 25 26 27 30] | ||
| V2.replace [V2.last] if ENV["FAST"] # HACK | ||
@@ -25,6 +25,13 @@ | ||
| dependency "sexp_processor", "~> 4.9" | ||
| dependency "rake", "< 11", :developer | ||
| dependency "sexp_processor", ["~> 4.15", ">= 4.15.1"] | ||
| dependency "rake", [">= 10", "< 15"], :developer | ||
| dependency "oedipus_lex", "~> 2.5", :developer | ||
| # NOTE: Ryan!!! Stop trying to fix this dependency! Isolate just | ||
| # can't handle having a faux-gem half-installed! Stop! Just `gem | ||
| # install racc` and move on. Revisit this ONLY once racc-compiler | ||
| # gets split out. | ||
| dependency "racc", "~> 1.5", :developer | ||
| require_ruby_version [">= 2.1", "< 4"] | ||
@@ -97,3 +104,3 @@ | ||
| dir = v[/^\d+\.\d+/] | ||
| url = "https://cache.ruby-lang.org/pub/ruby/#{dir}/ruby-#{v}.tar.bz2" | ||
| url = "https://cache.ruby-lang.org/pub/ruby/#{dir}/ruby-#{v}.tar.xz" | ||
| path = File.basename url | ||
@@ -110,3 +117,3 @@ unless File.exist? path then | ||
| parse_y = "parse#{v}.y" | ||
| tarball = "ruby-#{version}.tar.bz2" | ||
| tarball = "ruby-#{version}.tar.xz" | ||
| ruby_dir = "ruby-#{version}" | ||
@@ -131,6 +138,9 @@ diff = "diff#{v}.diff" | ||
| desc "fetch all tarballs" | ||
| task :fetch => c_tarball | ||
| file c_parse_y => c_tarball do | ||
| in_compare do | ||
| extract_glob = case version | ||
| when /2\.7/ | ||
| when /2\.7|3\.0/ | ||
| "{id.h,parse.y,tool/{id2token.rb,lib/vpath.rb}}" | ||
@@ -140,3 +150,3 @@ else | ||
| end | ||
| system "tar yxf #{tarball} #{ruby_dir}/#{extract_glob}" | ||
| system "tar Jxf #{tarball} #{ruby_dir}/#{extract_glob}" | ||
@@ -156,5 +166,10 @@ Dir.chdir ruby_dir do | ||
| bison = Dir["/opt/homebrew/opt/bison/bin/bison", | ||
| "/usr/local/opt/bison/bin/bison", | ||
| `which bison`.chomp, | ||
| ].first | ||
| file c_mri_txt => [c_parse_y, normalize] do | ||
| in_compare do | ||
| sh "bison -r all #{parse_y}" | ||
| sh "#{bison} -r all #{parse_y}" | ||
| sh "./normalize.rb parse#{v}.output > #{mri_txt}" | ||
@@ -204,6 +219,7 @@ rm ["parse#{v}.output", "parse#{v}.tab.c"] | ||
| ruby_parse "2.3.8" | ||
| ruby_parse "2.4.9" | ||
| ruby_parse "2.5.8" | ||
| ruby_parse "2.6.6" | ||
| ruby_parse "2.7.1" | ||
| ruby_parse "2.4.10" | ||
| ruby_parse "2.5.9" | ||
| ruby_parse "2.6.7" | ||
| ruby_parse "2.7.3" | ||
| ruby_parse "3.0.1" | ||
@@ -210,0 +226,0 @@ task :debug => :isolate do |
+2
-2
@@ -200,4 +200,4 @@ #!/usr/bin/env ruby -ws | ||
| a, b, c = $1.upcase, $2.upcase, $3 | ||
| a.gsub! /EXPR_/, "" | ||
| b.gsub! /EXPR_/, "" | ||
| a.gsub!(/EXPR_/, "") | ||
| b.gsub!(/EXPR_/, "") | ||
| if c && $v then | ||
@@ -204,0 +204,0 @@ puts "lex_state: #{a} -> #{b} at #{c}" |
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is not supported yet
Sorry, the diff of this file is too big to display
Sorry, the diff of this file is too big to display