[reSIProcate] Uri escaping

Fri Apr 9 12:07:32 CDT 2010

I guess that would change the default to always escaping anything outside of US ASCII (be it valid UTF-8 bytes or garbage).  Sounds good to me.

Thanks,
Jeremy

From: slgodin at gmail.com [mailto:slgodin at gmail.com] On Behalf Of Scott Godin
Sent: April-09-10 6:53 AM
To: Jeremy Geras
Cc: resiprocate-devel at resiprocate.org
Subject: Re: [reSIProcate] Uri escaping

Why not just fix Uri::shouldEscapeUserChar so that if c < 0 then return true, instead of false?

inline bool
Uri::shouldEscapeUserChar(char c)
{
   if(!mEncodingReady)
   {
      initialiseEncodingTables();
   }

   if(c < 0)
   {
      return false;
   }

   return mUriEncodingUserTable[c];
}

Scott

2010/4/9 Jeremy Geras <jgeras at counterpath.com<mailto:jgeras at counterpath.com>>
Here's an attempt at fixing escapeToStream(..) which seems to work OK...

template<class Predicate> EncodeStream&
Data::escapeToStream(EncodeStream& str, Predicate shouldEscape) const
{
   static char hex[] = "0123456789ABCDEF";

   if (empty())
   {
      return str;
   }

   const char* p = mBuf;
   const char* e = mBuf + mSize;

   while (p < e)
   {
      // ?abr? Why is this special cased? Removing this code
      // does not change the behavior of this method.
      if (*p == '%'
          && e - p > 2
          && isHex(*(p+1))
          && isHex(*(p+2)))
      {
         str.write(p, 3);
         p+=3;
      }
      else if (shouldEscape(*p))
      {
         int hi = (*p & 0xF0)>>4;
         int low = (*p & 0x0F);

         str << '%' << hex[hi] << hex[low];
         p++;
      }
// BEGIN NEW
      else if ((*p) < 0)
      {
         // count the number of UTF-8 bytes
         if (((*p) | 0x1F) == ~0x20)
         {
            Data utf8(Data::Borrow, p, 2);
            utf8.urlEncode(str);
            p+=2;
         }
         else if (((*p) | 0x0F) == ~0x10)
         {
            Data utf8(Data::Borrow, p, 3);
            utf8.urlEncode(str);
            p+=3;
         }
         else if (((*p) | 0x07) == ~0x08)
         {
            Data utf8(Data::Borrow, p, 4);
            utf8.urlEncode(str);
            p+=4;
         }
         else
         {
            str.put(*p++);
         }
      }
// END NEW
      else
      {
         str.put(*p++);
      }
   }
   return str;
}

From: resiprocate-devel-bounces at resiprocate.org<mailto:resiprocate-devel-bounces at resiprocate.org> [mailto:resiprocate-devel-bounces at resiprocate.org<mailto:resiprocate-devel-bounces at resiprocate.org>] On Behalf Of Jeremy Geras
Sent: April-08-10 4:05 PM

To: Scott Godin
Cc: resiprocate-devel at resiprocate.org<mailto:resiprocate-devel at resiprocate.org>
Subject: Re: [reSIProcate] Uri escaping

So my latest thinking is that my attempted fix is definitely bad, since it doesn't take into account the special 3261 rules (like + being a valid user character).

So I guess the real fix is to fix escapeToStream(..) so that it can detect UTF-8 characters and % HEX HEX encode them...?

From: Jeremy Geras
Sent: April-08-10 2:41 PM
To: 'Scott Godin'
Cc: resiprocate-devel at resiprocate.org<mailto:resiprocate-devel at resiprocate.org>
Subject: RE: [reSIProcate] Uri escaping

Hmmm... Won't Uri::shouldEscapeUserChar return false with non-US ASCII characters?  (That's what I'm seeing... c == -60) when the URI contains a character like Ă.

From: slgodin at gmail.com<mailto:slgodin at gmail.com> [mailto:slgodin at gmail.com<mailto:slgodin at gmail.com>] On Behalf Of Scott Godin
Sent: April-08-10 2:31 PM
To: Jeremy Geras
Cc: resiprocate-devel at resiprocate.org<mailto:resiprocate-devel at resiprocate.org>
Subject: Re: [reSIProcate] Uri escaping

I agree that RFC3261 specifies the user part should be escaped (ie. % HEXDIG HEXDIG):

user             =  1*( unreserved / escaped / user-unreserved )
However looking at the Data::escapeToStream method - it should be escaping using (% HEXDIG HEXDIG) - it's not really clear to me how you are ending up with non-escaped characters in the encoding.  We should figure out why escapeToStream is not working for you.

Scott

On Thu, Apr 8, 2010 at 4:59 PM, Jeremy Geras <jgeras at counterpath.com<mailto:jgeras at counterpath.com>> wrote:
Hi,

General question:  How should URIs be escaped?

See the attached trace for an example of what we're seeing with resip -- note the Contact header in the NOTIFY, and the subsequent Request-URI in the re-SUBSCRIBE.  Should the URI in the request line be URL-encoded?  (Pouring through the BNF in 3261 leads me to think that it should...)

Note that this is a trace from a deployment where the re-SUBSCRIBE actually fails because of the lack of proper escaping.  I tried making the following change in Uri.cxx, and with that in place the re-SUBSCRIBE works.  However I'm not sure if this is correct, or if this is the right spot.  Thoughts?

Thanks,
Jeremy

// should not encode user parameters unless its a tel?
EncodeStream&
Uri::encodeParsed(EncodeStream& str) const
{
   str << mScheme << Symbols::COLON;
   if (!mUser.empty())
   {
#ifdef HANDLE_CHARACTER_ESCAPING
      //mUser.escapeToStream(str, shouldEscapeUserChar); <== ORIGINAL CODE
      mUser.urlEncode(str);
#else
      str << mUser;
#endif

 ... snip ...

_______________________________________________
resiprocate-devel mailing list
resiprocate-devel at resiprocate.org<mailto:resiprocate-devel at resiprocate.org>
https://list.resiprocate.org/mailman/listinfo/resiprocate-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.resiprocate.org/pipermail/resiprocate-devel/attachments/20100409/e8ef35ea/attachment.htm>