RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Sprague, Gary
Hi All.

Some months ago I had worked up a new way to parse JSON data which relied pretty heavily on regular expressions.  One piece of the code was designed to take strings that contained braces or brackets and escape them prior to processing.  Over this past weekend I discovered that this piece of the code was not working. (clearly I did not do enough testing on this)  Someone at my company had inserted braces into a field that was passed to us in a JSON feed that forced exposure of the issue.

I get the following error with braces inside the string (meeting the regular expression match below):
   ERROR: U_INDEX_OUTOFBOUNDS_ERROR at: appendreplacement with params: ...

So here is the portion of code that is breaking:

/*  First we find "value" strings that may have reserved characters that are not escaped. They are: { } [ ]  */
local('myre' = regexp(
          -find='(?<=:)\\s*"(?:(?:(?<=\\\\)["{}\\[\\]<smb://]>]|[^"])*(?:(?<!\\\\)[{}\\[\\]]<smb://]]>)+(?:(?<=\\\\)["{}\\[\\]<smb://]>]|[^"])*)+?"',
          -input=#_json)
          );

while(#myre->find);
    local('_replacement' = #myre->matchstring);
    // We replace any: { } [ ] that are not escaped in the matchstring
    #_replacement->replace(regexp(-find='((?<!\\\\)[{}\\[\\]]<smb://]]>){1}',-replace='\\\\\\1'));
    // Now we ensure that all backslashes are doubled because the first backslash will be removed on append.
    #_replacement->replace(regexp(-find='(\\\\){1}',-replace='\\\\\\\\'));
    #myre->appendreplacement(#_replacement);
/while;
#myre->appendtail;

/*  Now we continue by passing the result above. */
#_json = #myre->output;

So, at first I figured that I had done something wrong, as the examples seem vague.  Further researching I found examples of “appendreplacement” replacing with strings just as I am doing in my code. (Fletchers example reflects this here:  http://www.lassotalk.com/Re-Another-Regex-problem.lasso?169634 )

So in my testing I get the following results:


  *   #myre->appendreplacement(#_replacement);    Returns the error: ERROR: U_INDEX_OUTOFBOUNDS_ERROR
  *   #myre->appendreplacement(‘\\0'<smb://0'>);  Replaces my string with a “0”  (as per example on page 356 of the Lasso 8.6 Language Guide)
  *   #myre->appendreplacement(‘\$0');  Replaces my string with the match string.

Has anyone else had experience (or issues) with this method?  Is this a bug or a documentation issue?

I am running Lasso 8.6.2 on Windows Server 2008 Standard.  (I have also tested with latest Lasso 8.6.3 on the same OS)

Thanks in advance for any help.


Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL] RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Sprague, Gary
BTW, the  "<smb://]>” mentions in the sample code were not part of my posting.  Apparently either my email client or other system along the chain inserted this code.  If you remove them, the rest of the code should be accurate.

Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>

On Mar 17, 2015, at 2:07 PM, Sprague, Gary <[hidden email]<mailto:[hidden email]>> wrote:

Hi All.

Some months ago I had worked up a new way to parse JSON data which relied pretty heavily on regular expressions.  One piece of the code was designed to take strings that contained braces or brackets and escape them prior to processing.  Over this past weekend I discovered that this piece of the code was not working. (clearly I did not do enough testing on this)  Someone at my company had inserted braces into a field that was passed to us in a JSON feed that forced exposure of the issue.

I get the following error with braces inside the string (meeting the regular expression match below):
  ERROR: U_INDEX_OUTOFBOUNDS_ERROR at: appendreplacement with params: ...

So here is the portion of code that is breaking:

/*  First we find "value" strings that may have reserved characters that are not escaped. They are: { } [ ]  */
local('myre' = regexp(
         -find='(?<=:)\\s*"(?:(?:(?<=\\\\)["{}\\[\\]<smb://]>]|<smb://]<smb://]>]|>[^"])*(?:(?<!\\\\)[{}\\[\\]]<smb://]]><smb://]]<smb://]]>>)+(?:(?<=\\\\)["{}\\[\\]<smb://]>]|<smb://]<smb://]>]|>[^"])*)+?"',
         -input=#_json)
         );

while(#myre->find);
   local('_replacement' = #myre->matchstring);
   // We replace any: { } [ ] that are not escaped in the matchstring
   #_replacement->replace(regexp(-find='((?<!\\\\)[{}\\[\\]]<smb://]]><smb://]]<smb://]]>>){1}',-replace='\\\\\\1'));
   // Now we ensure that all backslashes are doubled because the first backslash will be removed on append.
   #_replacement->replace(regexp(-find='(\\\\){1}',-replace='\\\\\\\\'));
   #myre->appendreplacement(#_replacement);
/while;
#myre->appendtail;

/*  Now we continue by passing the result above. */
#_json = #myre->output;

So, at first I figured that I had done something wrong, as the examples seem vague.  Further researching I found examples of “appendreplacement” replacing with strings just as I am doing in my code. (Fletchers example reflects this here:  http://www.lassotalk.com/Re-Another-Regex-problem.lasso?169634 )

So in my testing I get the following results:


 *   #myre->appendreplacement(#_replacement);    Returns the error: ERROR: U_INDEX_OUTOFBOUNDS_ERROR
 *   #myre->appendreplacement(‘\\0'<smb://0'><smb://0'<smb://0'>>);  Replaces my string with a “0”  (as per example on page 356 of the Lasso 8.6 Language Guide)
 *   #myre->appendreplacement(‘\$0');  Replaces my string with the match string.

Has anyone else had experience (or issues) with this method?  Is this a bug or a documentation issue?

I am running Lasso 8.6.2 on Windows Server 2008 Standard.  (I have also tested with latest Lasso 8.6.3 on the same OS)

Thanks in advance for any help.


Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>


#############################################################

This message is sent to you because you are subscribed to
 the mailing list Lasso [hidden email]<mailto:[hidden email]>
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]<mailto:[hidden email]>>
Send administrative queries to  <[hidden email]<mailto:[hidden email]>>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Sprague, Gary
In reply to this post by Sprague, Gary
BTW, the  "<smb://]>” mentions in the sample code were not part of my posting.  Apparently either my email client or other system along the chain inserted this code.  If you remove them, the rest of the code should be accurate.

Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>

On Mar 17, 2015, at 2:07 PM, Sprague, Gary <[hidden email]<mailto:[hidden email]>> wrote:

Hi All.

Some months ago I had worked up a new way to parse JSON data which relied pretty heavily on regular expressions.  One piece of the code was designed to take strings that contained braces or brackets and escape them prior to processing.  Over this past weekend I discovered that this piece of the code was not working. (clearly I did not do enough testing on this)  Someone at my company had inserted braces into a field that was passed to us in a JSON feed that forced exposure of the issue.

I get the following error with braces inside the string (meeting the regular expression match below):
  ERROR: U_INDEX_OUTOFBOUNDS_ERROR at: appendreplacement with params: ...

So here is the portion of code that is breaking:

/*  First we find "value" strings that may have reserved characters that are not escaped. They are: { } [ ]  */
local('myre' = regexp(
         -find='(?<=:)\\s*"(?:(?:(?<=\\\\)["{}\\[\\]<smb://]>]|<smb://]<smb://]>]|>[^"])*(?:(?<!\\\\)[{}\\[\\]]<smb://]]><smb://]]<smb://]]>>)+(?:(?<=\\\\)["{}\\[\\]<smb://]>]|<smb://]<smb://]>]|>[^"])*)+?"',
         -input=#_json)
         );

while(#myre->find);
   local('_replacement' = #myre->matchstring);
   // We replace any: { } [ ] that are not escaped in the matchstring
   #_replacement->replace(regexp(-find='((?<!\\\\)[{}\\[\\]]<smb://]]><smb://]]<smb://]]>>){1}',-replace='\\\\\\1'));
   // Now we ensure that all backslashes are doubled because the first backslash will be removed on append.
   #_replacement->replace(regexp(-find='(\\\\){1}',-replace='\\\\\\\\'));
   #myre->appendreplacement(#_replacement);
/while;
#myre->appendtail;

/*  Now we continue by passing the result above. */
#_json = #myre->output;

So, at first I figured that I had done something wrong, as the examples seem vague.  Further researching I found examples of “appendreplacement” replacing with strings just as I am doing in my code. (Fletchers example reflects this here:  http://www.lassotalk.com/Re-Another-Regex-problem.lasso?169634 )

So in my testing I get the following results:


 *   #myre->appendreplacement(#_replacement);    Returns the error: ERROR: U_INDEX_OUTOFBOUNDS_ERROR
 *   #myre->appendreplacement(‘\\0'<smb://0'><smb://0'<smb://0'>>);  Replaces my string with a “0”  (as per example on page 356 of the Lasso 8.6 Language Guide)
 *   #myre->appendreplacement(‘\$0');  Replaces my string with the match string.

Has anyone else had experience (or issues) with this method?  Is this a bug or a documentation issue?

I am running Lasso 8.6.2 on Windows Server 2008 Standard.  (I have also tested with latest Lasso 8.6.3 on the same OS)

Thanks in advance for any help.


Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]><mailto:[hidden email]>


#############################################################

This message is sent to you because you are subscribed to
 the mailing list Lasso [hidden email]<mailto:[hidden email]>
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]<mailto:[hidden email]>>
Send administrative queries to  <[hidden email]<mailto:[hidden email]>>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Sprague, Gary
In reply to this post by Sprague, Gary
SOLVED!

Ok.  I found that I needed to escape any $ signs in the string as they are treated as groups if followed by a number.  What was missing from my example was pricing.  The string contained: "Cooks Event Price $39.95” which the regex treated the price as a grouping.  The solution is for me to do another pass with my #_replacement local variable just before the “appendreplacement”, like this:

#_replacement->replace(regexp(-find='((?<!\\\\)[\\$]<smb://$]>){1}',-replace='\\\\\\1'));

(please ignore any "<smb://]>” in the code that the Lassotalk server is adding to the code)

Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>

On Mar 17, 2015, at 2:41 PM, Sprague, Gary <[hidden email]<mailto:[hidden email]>> wrote:

Hi All.

Some months ago I had worked up a new way to parse JSON data which relied pretty heavily on regular expressions.  One piece of the code was designed to take strings that contained braces or brackets and escape them prior to processing.  Over this past weekend I discovered that this piece of the code was not working. (clearly I did not do enough testing on this)  Someone at my company had inserted braces into a field that was passed to us in a JSON feed that forced exposure of the issue.

I get the following error with braces inside the string (meeting the regular expression match below):
   ERROR: U_INDEX_OUTOFBOUNDS_ERROR at: appendreplacement with params: ...

So here is the portion of code that is breaking:

/*  First we find "value" strings that may have reserved characters that are not escaped. They are: { } [ ]  */
local('myre' = regexp(
          -find='(?<=:)\\s*"(?:(?:(?<=\\\\)["{}\\[<smb://[>\\]<smb://]>]|[^"])*(?:(?<!\\\\)[{}\\[<smb://[>\\]]<smb://]]>)+(?:(?<=\\\\)["{}\\[<smb://[>\\]<smb://]>]|[^"])*)+?"',
          -input=#_json)
          );

while(#myre->find);
    local('_replacement' = #myre->matchstring);
    // We replace any: { } [ ] that are not escaped in the matchstring
    #_replacement->replace(regexp(-find='((?<!\\\\)[{}\\[<smb://[>\\]]<smb://]]>){1}',-replace='\\\\\\1'));
    // Now we ensure that all backslashes are doubled because the first backslash will be removed on append.
    #_replacement->replace(regexp(-find='(\\\\){1}',-replace='\\\\\\\\'));
    #myre->appendreplacement(#_replacement);
/while;
#myre->appendtail;

/*  Now we continue by passing the result above. */
#_json = #myre->output;

So, at first I figured that I had done something wrong, as the examples seem vague.  Further researching I found examples of “appendreplacement” replacing with strings just as I am doing in my code. (Fletchers example reflects this here:  http://www.lassotalk.com/Re-Another-Regex-problem.lasso?169634 )

So in my testing I get the following results:


  *   #myre->appendreplacement(#_replacement);    Returns the error: ERROR: U_INDEX_OUTOFBOUNDS_ERROR
  *   #myre->appendreplacement(‘\\0'<smb://0'>);  Replaces my string with a “0”  (as per example on page 356 of the Lasso 8.6 Language Guide)
  *   #myre->appendreplacement(‘\$0');  Replaces my string with the match string.

Has anyone else had experience (or issues) with this method?  Is this a bug or a documentation issue?

I am running Lasso 8.6.2 on Windows Server 2008 Standard.  (I have also tested with latest Lasso 8.6.3 on the same OS)

Thanks in advance for any help.


Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Bil Corry-3
The dollar sign is but one of many special characters that have to be
properly encoded.  I recommend encoding every character to be safe.

- Bil

On Tue, Mar 17, 2015 at 8:46 PM, Sprague, Gary <[hidden email]> wrote:

> SOLVED!
>
> Ok.  I found that I needed to escape any $ signs in the string as they are
> treated as groups if followed by a number.  What was missing from my
> example was pricing.  The string contained: "Cooks Event Price $39.95”
> which the regex treated the price as a grouping.  The solution is for me to
> do another pass with my #_replacement local variable just before the
> “appendreplacement”, like this:
>
>
> #_replacement->replace(regexp(-find='((?<!\\\\)[\\$]<smb://$]>){1}',-replace='\\\\\\1'));
>
> (please ignore any "<smb://]>” in the code that the Lassotalk server is
> adding to the code)
>
> Gary Sprague
> TV Systems Engineer
> HSN, 1 HSN Drive, St. Petersburg, FL 33729
> Office 727.872.4489
> [hidden email]<mailto:[hidden email]>
>
> On Mar 17, 2015, at 2:41 PM, Sprague, Gary <[hidden email]<mailto:
> [hidden email]>> wrote:
>
> Hi All.
>
> Some months ago I had worked up a new way to parse JSON data which relied
> pretty heavily on regular expressions.  One piece of the code was designed
> to take strings that contained braces or brackets and escape them prior to
> processing.  Over this past weekend I discovered that this piece of the
> code was not working. (clearly I did not do enough testing on this)
> Someone at my company had inserted braces into a field that was passed to
> us in a JSON feed that forced exposure of the issue.
>
> I get the following error with braces inside the string (meeting the
> regular expression match below):
>    ERROR: U_INDEX_OUTOFBOUNDS_ERROR at: appendreplacement with params: ...
>
> So here is the portion of code that is breaking:
>
> /*  First we find "value" strings that may have reserved characters that
> are not escaped. They are: { } [ ]  */
> local('myre' = regexp(
>
> -find='(?<=:)\\s*"(?:(?:(?<=\\\\)["{}\\[<smb://[>\\]<smb://]>]|[^"])*(?:(?<!\\\\)[{}\\[<smb://[>\\]]<smb://]]>)+(?:(?<=\\\\)["{}\\[<smb://[>\\]<smb://]>]|[^"])*)+?"',
>           -input=#_json)
>           );
>
> while(#myre->find);
>     local('_replacement' = #myre->matchstring);
>     // We replace any: { } [ ] that are not escaped in the matchstring
>
> #_replacement->replace(regexp(-find='((?<!\\\\)[{}\\[<smb://[>\\]]<smb://]]>){1}',-replace='\\\\\\1'));
>     // Now we ensure that all backslashes are doubled because the first
> backslash will be removed on append.
>     #_replacement->replace(regexp(-find='(\\\\){1}',-replace='\\\\\\\\'));
>     #myre->appendreplacement(#_replacement);
> /while;
> #myre->appendtail;
>
> /*  Now we continue by passing the result above. */
> #_json = #myre->output;
>
> So, at first I figured that I had done something wrong, as the examples
> seem vague.  Further researching I found examples of “appendreplacement”
> replacing with strings just as I am doing in my code. (Fletchers example
> reflects this here:
> http://www.lassotalk.com/Re-Another-Regex-problem.lasso?169634 )
>
> So in my testing I get the following results:
>
>
>   *   #myre->appendreplacement(#_replacement);    Returns the error:
> ERROR: U_INDEX_OUTOFBOUNDS_ERROR
>   *   #myre->appendreplacement(‘\\0'<smb://0'>);  Replaces my string with
> a “0”  (as per example on page 356 of the Lasso 8.6 Language Guide)
>   *   #myre->appendreplacement(‘\$0');  Replaces my string with the match
> string.
>
> Has anyone else had experience (or issues) with this method?  Is this a
> bug or a documentation issue?
>
> I am running Lasso 8.6.2 on Windows Server 2008 Standard.  (I have also
> tested with latest Lasso 8.6.3 on the same OS)
>
> Thanks in advance for any help.
>
>
> Gary Sprague
> TV Systems Engineer
> HSN, 1 HSN Drive, St. Petersburg, FL 33729
> Office 727.872.4489
> [hidden email]<mailto:[hidden email]>
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: RegExp->AppendReplacement throwing error U_INDEX_OUTOFBOUNDS_ERROR

Sprague, Gary
Bil,

Based on the documentation by other parties, it looks like the dollar sign “$” is specifically the issue with regards to “appendreplacement”.  It needs to be escaped if not a capture group number reference.

In my specific instance, my concern was with JSON strings only in the method I am converting them in into maps and arrays.

The documentation for this is sparse in the Lasso 8.6 Language Guide and partially inaccurate.  (for example: ‘\\0'<smb://0'> (double-backslash 0) does not return matchstring, just returns a “0” where ‘\$0’ does return the matchstring)

It looks like Kyle is using openly available source code to implement into Lasso for the RegExMatcher based on exact function naming conventions.  You can see what applies here:

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#appendReplacement(java.lang.StringBuffer,%20java.lang.String)

and possibly here:

https://www.opensource.apple.com/source/ICU/ICU-6.2.6/icuSources/i18n/unicode/regex.h

Gary Sprague
TV Systems Engineer
HSN, 1 HSN Drive, St. Petersburg, FL 33729
Office 727.872.4489
[hidden email]<mailto:[hidden email]>

On Mar 17, 2015, at 4:50 PM, Bil Corry <[hidden email]<mailto:[hidden email]>> wrote:

The dollar sign is but one of many special characters that have to be
properly encoded.  I recommend encoding every character to be safe.

- Bil


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>