Hi folks, thanks for maintaining the URI gem!
I faced the following issue with a 65MB mime-body payload and over 13 million percent-encoded characters:
Regexp::TimeoutError POST /rails/action_mailbox/mailgun/inbound_emails/mime
vendor/bundle/ruby/3.3.0/gems/uri-0.13.3/lib/uri/common.rb:400:in `match?': regexp match timeout (Regexp::TimeoutError)
from vendor/bundle/ruby/3.3.0/gems/uri-0.13.3/lib/uri/common.rb:400:in `_decode_uri_component'
Ref:
My workaround was to monkey patch the decode_www_form_component to avoid the Regexp code path if it times out:
module URIFormComponentLinearDecode
ORIGINAL_DECODE_WWW_FORM_COMPONENT = URI.method(:decode_www_form_component)
DECODE_TABLE = URI.const_get(:TBLDECWWWCOMP_)
def decode_www_form_component(str, enc = Encoding::UTF_8)
ORIGINAL_DECODE_WWW_FORM_COMPONENT.call(str, enc)
rescue Regexp::TimeoutError
raise unless str.is_a?(String)
Rails.logger.info("[URIFormComponentLinearDecode] bytesize=#{str.bytesize}")
linear_decode_www_form_component(str, enc)
end
private
def linear_decode_www_form_component(str, enc)
source = str.b
output = String.new(capacity: source.bytesize).b
index = 0
while index < source.bytesize
byte = source.getbyte(index)
case byte
when 37 # "%"
raise ArgumentError, "invalid %-encoding (#{str})" unless index + 2 < source.bytesize
encoded = source.byteslice(index, 3)
decoded = DECODE_TABLE[encoded]
raise ArgumentError, "invalid %-encoding (#{str})" unless decoded
output << decoded
index += 3
when 43 # "+"
output << DECODE_TABLE["+"]
index += 1
else
output << byte
index += 1
end
end
output.force_encoding(enc)
end
end
URI.singleton_class.prepend(URIFormComponentLinearDecode)
I was wondering:
- Did you guys face this problem before?
- Do you have a better approach to it?
- Do you think a solution to this issue belongs in the URI codebase?
- Do you think it would make sense to use a native function in this case?
I'm happy to contribute with a PR if you would like me to. Please let me know if you have any thoughts.
Thanks.
Hi folks, thanks for maintaining the URI gem!
I faced the following issue with a 65MB mime-body payload and over 13 million percent-encoded characters:
Ref:
My workaround was to monkey patch the
decode_www_form_componentto avoid the Regexp code path if it times out:I was wondering:
I'm happy to contribute with a PR if you would like me to. Please let me know if you have any thoughts.
Thanks.