Folding lines on "octets"

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Folding lines on "octets"

Jolle Carlestam-2
The Icalendar format stipulates the following:
Lines of text SHOULD NOT be longer than 75 octets

I gather that octets mean bytes. If so, how do I find out where to put in my line feeds? In almost all cases I could probably simply wrap based on a character count. But if I wanted my solution to be full proof regardless of the language used in the strings how would I do that?

This is Lasso 9.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
14 jul 2014 kl. 23:52 skrev Jolle Carlestam <[hidden email]>:

> The Icalendar format stipulates the following:
> Lines of text SHOULD NOT be longer than 75 octets
>
> I gather that octets mean bytes. If so, how do I find out where to put in my line feeds? In almost all cases I could probably simply wrap based on a character count. But if I wanted my solution to be full proof regardless of the language used in the strings how would I do that?
>
> This is Lasso 9.

Think I have part of the answer.
string('Here is a räksmörgås') -> size
-> 23
bytes('Here is a räksmörgås') -> size
-> 29

So running bytes(’my string’) -> size will help me find the sections I need to work with. But still have no good idea on how to insert CRLF followed by a space.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

fletcher sandbeck-2
In Lasso 8 the tag encode_quotedprintablebytes would do this.  It was used by the email system to implement the encode_quotedprintable tag which is defined like this.

define_tag: 'quotedprintable',
    -priority = 'replace',
    -required = 'data',
    -optional = 'charset',
    -namespace = 'encode_';
  if: (local_defined: 'charset') && (#charset != 'utf-8');
    fail_if: (#charset != 'iso-8859-1') && !(string_validcharset: #charset), -9956, '[Encode_QuotedPrintable] Expected a valid character set but got ' + #charset + '.';
    local: 'bytes' = (bytes: #data, #charset);
  else;
    local: 'bytes' = (bytes: #data);
  /if;
  #bytes = (encode_quotedprintablebytes: #bytes);
  return: (string: #bytes, 'iso-8859-1');
/define_tag;

[fletcher]

On Jul 14, 2014, at 3:00 PM, Jolle Carlestam <[hidden email]> wrote:

> 14 jul 2014 kl. 23:52 skrev Jolle Carlestam <[hidden email]>:
>
>> The Icalendar format stipulates the following:
>> Lines of text SHOULD NOT be longer than 75 octets
>>
>> I gather that octets mean bytes. If so, how do I find out where to put in my line feeds? In almost all cases I could probably simply wrap based on a character count. But if I wanted my solution to be full proof regardless of the language used in the strings how would I do that?
>>
>> This is Lasso 9.
>
> Think I have part of the answer.
> string('Here is a räksmörgås') -> size
> -> 23
> bytes('Here is a räksmörgås') -> size
> -> 29
>
> So running bytes(’my string’) -> size will help me find the sections I need to work with. But still have no good idea on how to insert CRLF followed by a space.
>
> HDB
> Jolle
> #############################################################
> Attend the Lasso Developer Conference 2014!
> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
> http://www.lassosoft.com/LDC-newmarket-2014
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
15 jul 2014 kl. 00:20 skrev Fletcher Sandbeck <[hidden email]>:

> In Lasso 8 the tag encode_quotedprintablebytes would do this.  It was used by the email system to implement the encode_quotedprintable tag which is defined like this.
>
> define_tag: 'quotedprintable',
>    -priority = 'replace',
>    -required = 'data',
>    -optional = 'charset',
>    -namespace = 'encode_';
>  if: (local_defined: 'charset') && (#charset != 'utf-8');
>    fail_if: (#charset != 'iso-8859-1') && !(string_validcharset: #charset), -9956, '[Encode_QuotedPrintable] Expected a valid character set but got ' + #charset + '.';
>    local: 'bytes' = (bytes: #data, #charset);
>  else;
>    local: 'bytes' = (bytes: #data);
>  /if;
>  #bytes = (encode_quotedprintablebytes: #bytes);
>  return: (string: #bytes, 'iso-8859-1');
> /define_tag;
>
> [fletcher]


That is interesting information, thanks Fletcher!

Using that to explore what Lasso 9 has hidden shows that there seems to be no encode_quotedprintablebytes defined. There is however an encode_quotedprintable defined like this:

define encode_quotedprintable(b::bytes) => #b->encodeqp

It looks like a wrapper for bytes -> encodeqp. Unfortunately there is no definition of that method available in the open sourced section. And testing it shows some good efforts but not fulfilling my needs. Main flaw, it will not satisfy itself with adding line feeds. It also encodes all non ASCII chars in the string. My ”räksmörgås” suddenly looks like this
r=C3=A4ksm=C3=B6rg=C3=A5s

That is a bit to much interference for my liking. The Icalendar format has no beef with plain UTF-8 so there’s no need for quoted printable encoding. All I want it to do is add the line feeds.

Would be good to have access to the definition for bytes -> encodeqp. To allow me to make a variation that only does the line feeds and add a space.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Ke Carlton-3
In reply to this post by Jolle Carlestam-2
It sounds like you want to do something like this:

local(out = bytes, delim = '\r\n ')

with word in 'Here is a räksmörgås->eachword
let append = bytes((#out->size ? ' ' | '') + #word)
let line = #out->split(#delim)->last
do {
   #line->size + #append->size  > 75
   ? #out->append(#delim + #append)
   | #out->append(#append)
}

There's probably a nicer way of doing the #out->split bit, but heh it works.

Ke



On 15 July 2014 09:52, Jolle Carlestam <[hidden email]> wrote:

> The Icalendar format stipulates the following:
> Lines of text SHOULD NOT be longer than 75 octets
>
> I gather that octets mean bytes. If so, how do I find out where to put in
> my line feeds? In almost all cases I could probably simply wrap based on a
> character count. But if I wanted my solution to be full proof regardless of
> the language used in the strings how would I do that?
>
> This is Lasso 9.
>
> HDB
> Jolle
> #############################################################
> Attend the Lasso Developer Conference 2014!
> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
> http://www.lassosoft.com/LDC-newmarket-2014
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
15 jul 2014 kl. 00:43 skrev Ke Carlton <[hidden email]>:

> It sounds like you want to do something like this:
>
> local(out = bytes, delim = '\r\n ')
>
> with word in 'Here is a räksmörgås->eachword
> let append = bytes((#out->size ? ' ' | '') + #word)
> let line = #out->split(#delim)->last
> do {
>   #line->size + #append->size  > 75
>   ? #out->append(#delim + #append)
>   | #out->append(#append)
> }
>
> There's probably a nicer way of doing the #out->split bit, but heh it works.

I like it. Don’t think it is fail safe. Using words as the smallest object can run into issues when the wrap is needed to be done on a long URL for example. That’s where I stumbled into it to start with. A URL that was three lines long when wrapped…

I will continue the experiments.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
15 jul 2014 kl. 00:53 skrev Jolle Carlestam <[hidden email]>:

> I like it. Don’t think it is fail safe. Using words as the smallest object can run into issues when the wrap is needed to be done on a long URL for example. That’s where I stumbled into it to start with. A URL that was three lines long when wrapped…


Based on Kes suggestion I’ve worked out the following. It seems to handle all situations.

local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adipiscing elit. Mauris consequat ornare lectus, räksmörgås dignissim iaculis libero consequat sed. Proin räksmörgås quis magna in arcu sagittis consequat sed ac risus. Ut a pharetra dui. Phasellus molestie, mauris eget scelerisque laoreet, räksmörgås diam dolor vulputate nulla, in porta sem sem sit amet lacus. Aenean sed volutpat magna. räksmörgås Vestibulum lobortis mollis lectus, eu semper quam congue at. Donec ac ligula a neque räksmörgås tincidunt elementum. Nam urna felis, interdum non ullamcorper eget, commodo viverra ligula. räksmörgås Fusce cursus dolor in nisl tincidunt non sagittis libero elementum. Maecenas rhoncus ornare räksmörgås gravida. Nullam luctus pulvinar lorem, laoreet aliquet räksmörgås massa malesuada eget slut nu.’))

local(out = bytes, delim = '\r\n ', counter = 0)

with char in #teststring -> eachCharacter do {

        #counter += bytes(#char) -> size
        if(#counter >= 73) => {
                #out -> append(#delim + #char)
                #counter = 0
        else
                #out -> append(#char)
        }
}

#out

Result:
räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adipi
 scing elit. Mauris consequat ornare lectus, räksmörgås dignissim iaculi
 s libero consequat sed. Proin räksmörgås quis magna in arcu sagittis co
 nsequat sed ac risus. Ut a pharetra dui. Phasellus molestie, mauris eget s
 celerisque laoreet, räksmörgås diam dolor vulputate nulla, in porta sem
  sem sit amet lacus. Aenean sed volutpat magna. räksmörgås Vestibulum l
 obortis mollis lectus, eu semper quam congue at. Donec ac ligula a neque r
 äksmörgås tincidunt elementum. Nam urna felis, interdum non ullamcorper
 eget, commodo viverra ligula. räksmörgås Fusce cursus dolor in nisl tin
 cidunt non sagittis libero elementum. Maecenas rhoncus ornare räksmörgå
 s gravida. Nullam luctus pulvinar lorem, laoreet aliquet räksmörgås mas
 sa malesuada eget slut nu.


I am happy for any suggestions on improvements.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Kyle Jessup-2
Files, strings, bytes all have ->eachSub(::integer)
Used as so:

(with sub in #teststring->asBytes->eachSub(74)
        select #sub)->join('\n')->asBytes

This particular snippet does the to-string <-> to-bytes encoding twice which is a bit wasteful and I’m sure could be improved.

-Kyle


On Jul 14, 2014, at 7:47 PM, Jolle Carlestam <[hidden email]> wrote:

> 15 jul 2014 kl. 00:53 skrev Jolle Carlestam <[hidden email]>:
>
>> I like it. Don’t think it is fail safe. Using words as the smallest object can run into issues when the wrap is needed to be done on a long URL for example. That’s where I stumbled into it to start with. A URL that was three lines long when wrapped…
>
>
> Based on Kes suggestion I’ve worked out the following. It seems to handle all situations.
>
> local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adipiscing elit. Mauris consequat ornare lectus, räksmörgås dignissim iaculis libero consequat sed. Proin räksmörgås quis magna in arcu sagittis consequat sed ac risus. Ut a pharetra dui. Phasellus molestie, mauris eget scelerisque laoreet, räksmörgås diam dolor vulputate nulla, in porta sem sem sit amet lacus. Aenean sed volutpat magna. räksmörgås Vestibulum lobortis mollis lectus, eu semper quam congue at. Donec ac ligula a neque räksmörgås tincidunt elementum. Nam urna felis, interdum non ullamcorper eget, commodo viverra ligula. räksmörgås Fusce cursus dolor in nisl tincidunt non sagittis libero elementum. Maecenas rhoncus ornare räksmörgås gravida. Nullam luctus pulvinar lorem, laoreet aliquet räksmörgås massa malesuada eget slut nu.’))
>
> local(out = bytes, delim = '\r\n ', counter = 0)
>
> with char in #teststring -> eachCharacter do {
>
> #counter += bytes(#char) -> size
> if(#counter >= 73) => {
> #out -> append(#delim + #char)
> #counter = 0
> else
> #out -> append(#char)
> }
> }
>
> #out
>
> Result:
> räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adipi
> scing elit. Mauris consequat ornare lectus, räksmörgås dignissim iaculi
> s libero consequat sed. Proin räksmörgås quis magna in arcu sagittis co
> nsequat sed ac risus. Ut a pharetra dui. Phasellus molestie, mauris eget s
> celerisque laoreet, räksmörgås diam dolor vulputate nulla, in porta sem
>  sem sit amet lacus. Aenean sed volutpat magna. räksmörgås Vestibulum l
> obortis mollis lectus, eu semper quam congue at. Donec ac ligula a neque r
> äksmörgås tincidunt elementum. Nam urna felis, interdum non ullamcorper
> eget, commodo viverra ligula. räksmörgås Fusce cursus dolor in nisl tin
> cidunt non sagittis libero elementum. Maecenas rhoncus ornare räksmörgå
> s gravida. Nullam luctus pulvinar lorem, laoreet aliquet räksmörgås mas
> sa malesuada eget slut nu.
>
>
> I am happy for any suggestions on improvements.
>
> HDB
> Jolle
> #############################################################
> Attend the Lasso Developer Conference 2014!
> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
> http://www.lassosoft.com/LDC-newmarket-2014
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
15 jul 2014 kl. 03:31 skrev Kyle Jessup <[hidden email]>:

> Files, strings, bytes all have ->eachSub(::integer)
> Used as so:
>
> (with sub in #teststring->asBytes->eachSub(74)
> select #sub)->join('\n')->asBytes
>
> This particular snippet does the to-string <-> to-bytes encoding twice which is a bit wasteful and I’m sure could be improved.
>
> -Kyle

Short and compact, that’s good. Unfortunately it does not fulfill the requirements. If the the line feed is inserted based on bytes count only it might as well do the insert in the middle of a double byte char. As illustrated here:
local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äåöäåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.’))

(with sub in #teststring -> asBytes -> eachSub(74)
        select #sub) -> join('\r\n ') -> asBytes

->
räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äå�
 �äåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.

The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Richard Taubo
On Jul 15, 2014, at 9:19 AM, Jolle Carlestam <[hidden email]> wrote:

> 15 jul 2014 kl. 03:31 skrev Kyle Jessup <[hidden email]>:
>
>> Files, strings, bytes all have ->eachSub(::integer)
>> Used as so:
>>
>> (with sub in #teststring->asBytes->eachSub(74)
>> select #sub)->join('\n')->asBytes
>>
>> This particular snippet does the to-string <-> to-bytes encoding twice which is a bit wasteful and I’m sure could be improved.
>>
>> -Kyle
>
> Short and compact, that’s good. Unfortunately it does not fulfill the requirements. If the the line feed is inserted based on bytes count only it might as well do the insert in the middle of a double byte char. As illustrated here:
> local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äåöäåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.’))
>
> (with sub in #teststring -> asBytes -> eachSub(74)
> select #sub) -> join('\r\n ') -> asBytes
>
> ->
> räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äå�
> �äåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.
>
> The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.

Just curious: Is it common to split on 75th byte and not the space closest to the 75th byte?
In the cases where a user inputs a newline in the text, there is the slight 1% chance (or so)
that this will happen on the 75th octet . . .
Might be better to split on the 75th octet anyway, but just thinking out loud :-)

Richard Taubo
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

stevepiercy
In reply to this post by Jolle Carlestam-2
On 7/15/14 at 9:19 AM, [hidden email] (Jolle Carlestam) pronounced:

>15 jul 2014 kl. 03:31 skrev Kyle Jessup <[hidden email]>:
>
>>Files, strings, bytes all have ->eachSub(::integer)
>>Used as so:
>>
>>(with sub in #teststring->asBytes->eachSub(74)
>>select #sub)->join('\n')->asBytes
>>
>>This particular snippet does the to-string <-> to-bytes encoding twice which is a
>bit wasteful and I’m sure could be improved.
>>
>>-Kyle
>
>Short and compact, that’s good. Unfortunately it does not
>fulfill the requirements. If the the line feed is inserted
>based on bytes count only it might as well do the insert in the
>middle of a double byte char. As illustrated here:
>local(teststring = string('räksmörgås Lorem ipsum dolor sit
>amet, consectetur räksmörgåse äåöäåöäåöäåö
>adipiscing elit. Mauris consequat ornare lectus.’))
>
>(with sub in #teststring -> asBytes -> eachSub(74)
>select #sub) -> join('\r\n ') -> asBytes
>
>->
>räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äå�
>�äåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.
>
>The code need to take into consideration both that any line
>can’t be longer than 75 octets (bytes) and that any inserted
>line feed is placed between proper chars.

Thinking out loud, how about this algorithm in pseudo code?

* grab no more than 75 bytes from the start of a string and set
it to the current iteration
* in the current iteration, count backward until you find a
space character
* if there is no space character in the segment (i.e., long
URL), then insert a LF/CR after some "proper" character (i.e.,
"/" or "-")
* else replace the trailing space with a LF/CR
* prepend that portion after the LF/CR character back into the
remaining string
* next iteration

I'm not sure whether a LF/CR counts as a character in the RFC.

Unfortunately the bytes methods are poorly documented, so you
might have to dig into the source code to figure out what they
do.  Maybe bytes->getrange?

--steve


>HDB
>Jolle
>#############################################################
>Attend the Lasso Developer Conference 2014!
>October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
>http://www.lassosoft.com/LDC-newmarket-2014
>
>#############################################################
>
>This message is sent to you because you are subscribed to
>the mailing list Lasso [hidden email]
>Official list archives available at http://www.lassotalk.com
>To unsubscribe, E-mail to: <[hidden email]>
>Send administrative queries to  <[hidden email]>

-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Steve Piercy              Website Builder              Soquel, CA
<[hidden email]>               <http://www.StevePiercy.com/>

#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
In reply to this post by Richard Taubo

15 jul 2014 kl. 10:30 skrev Richard Taubo <[hidden email]>:

> On Jul 15, 2014, at 9:19 AM, Jolle Carlestam <[hidden email]> wrote:
>>
>> The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.
>
> Just curious: Is it common to split on 75th byte and not the space closest to the 75th byte?
> In the cases where a user inputs a newline in the text, there is the slight 1% chance (or so)
> that this will happen on the 75th octet . . .
> Might be better to split on the 75th octet anyway, but just thinking out loud :-)
>
> Richard Taubo

From what I’ve gathered reading the specs and examining existing implementations the split is always on the char closest to the 75 byte. Any manual inserted line feed is replaced with a \n and then counted in the line splitting process.

When an Icalendar object is used the process always strip out any CRLF followed by a space thus reconcatenating the long rows.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Richard Taubo

On Jul 15, 2014, at 10:49 AM, Jolle Carlestam <[hidden email]> wrote:

>
> 15 jul 2014 kl. 10:30 skrev Richard Taubo <[hidden email]>:
>
>> On Jul 15, 2014, at 9:19 AM, Jolle Carlestam <[hidden email]> wrote:
>>>
>>> The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.
>>
>> Just curious: Is it common to split on 75th byte and not the space closest to the 75th byte?
>> In the cases where a user inputs a newline in the text, there is the slight 1% chance (or so)
>> that this will happen on the 75th octet . . .
>> Might be better to split on the 75th octet anyway, but just thinking out loud :-)
>>
>> Richard Taubo
>
> From what I’ve gathered reading the specs and examining existing implementations the split is always on the char closest to the 75 byte. Any manual inserted line feed is replaced with a \n and then counted in the line splitting process.
>
> When an Icalendar object is used the process always strip out any CRLF followed by a space thus reconcatenating the long rows.

Okay, thanks :-)

Richard Taubo
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
In reply to this post by stevepiercy
15 jul 2014 kl. 10:46 skrev Steve Piercy - Website Builder <[hidden email]>:

> Thinking out loud, how about this algorithm in pseudo code?
>
> * grab no more than 75 bytes from the start of a string and set
> it to the current iteration
> * in the current iteration, count backward until you find a
> space character
> * if there is no space character in the segment (i.e., long
> URL), then insert a LF/CR after some "proper" character (i.e.,
> "/" or "-")
> * else replace the trailing space with a LF/CR
> * prepend that portion after the LF/CR character back into the
> remaining string
> * next iteration
>
> I'm not sure whether a LF/CR counts as a character in the RFC.
>
> Unfortunately the bytes methods are poorly documented, so you
> might have to dig into the source code to figure out what they
> do.  Maybe bytes->getrange?

I will republish my present solution seeing that it did not get any attention. I did look into getrange but it had the same issue as other proposed solutions that it could grab a range in the midst of a multi byte char.
The Icalender spec does not care about inserting the CRLF followed by a space somewhere pretty, like between words etc. This is for transport only and not presentation. Before any content is shown, for example in a calendar, all occurrences of CRLF followed by a space is stripped restoring the long lines to the original look.

Here’s what I’ve come up with that is working:

local(out = bytes, delim = '\r\n ', counter = 0)

with char in #teststring -> eachCharacter do {

        #counter += bytes(#char) -> size
        if(#counter >= 73) => {
                #out -> append(#delim + #char)
                #counter = 0
        else
                #out -> append(#char)
        }
}

Some comments.
This will insert the CRLF followed by a space between chars but never inside a multi byte char.
I trigger the CRLF followed by a space insertion when I reach 73 bytes since that will take into consideration that the following char could be a multi byte char. The spec does not care about cutting lines shorter than 75 octets as it will remove any occurrence of CRLF followed by a space. Only requirement is that any row can’t be longer than 75 octets plus the CRLF. I am making an assumption that there will be no chars that are more than 2 bytes long. With my needs making this work for mainly Swedish charset that is a safe assumption.

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Kyle Jessup-2
In reply to this post by Jolle Carlestam-2

On Jul 15, 2014, at 3:19 AM, Jolle Carlestam <[hidden email]> wrote:

> 15 jul 2014 kl. 03:31 skrev Kyle Jessup <[hidden email]>:
>
>> Files, strings, bytes all have ->eachSub(::integer)
>> Used as so:
>>
>> (with sub in #teststring->asBytes->eachSub(74)
>> select #sub)->join('\n')->asBytes
>>
>> This particular snippet does the to-string <-> to-bytes encoding twice which is a bit wasteful and I’m sure could be improved.
>>
>> -Kyle
>
> Short and compact, that’s good. Unfortunately it does not fulfill the requirements. If the the line feed is inserted based on bytes count only it might as well do the insert in the middle of a double byte char.

Gotcha. UTF-8 character sequences can be up to 6 bytes, FWIW.
-Kyle

> As illustrated here:
> local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äåöäåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.’))
>
> (with sub in #teststring -> asBytes -> eachSub(74)
> select #sub) -> join('\r\n ') -> asBytes
>
> ->
> räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äå�
> �äåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.
>
> The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.
>
> HDB
> Jolle
> #############################################################
> Attend the Lasso Developer Conference 2014!
> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
> http://www.lassosoft.com/LDC-newmarket-2014
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Kyle Jessup-2

On Jul 15, 2014, at 7:04 AM, Kyle Jessup <[hidden email]> wrote:

>
> On Jul 15, 2014, at 3:19 AM, Jolle Carlestam <[hidden email]> wrote:
>
>> 15 jul 2014 kl. 03:31 skrev Kyle Jessup <[hidden email]>:
>>
>>> Files, strings, bytes all have ->eachSub(::integer)
>>> Used as so:
>>>
>>> (with sub in #teststring->asBytes->eachSub(74)
>>> select #sub)->join('\n')->asBytes
>>>
>>> This particular snippet does the to-string <-> to-bytes encoding twice which is a bit wasteful and I’m sure could be improved.
>>>
>>> -Kyle
>>
>> Short and compact, that’s good. Unfortunately it does not fulfill the requirements. If the the line feed is inserted based on bytes count only it might as well do the insert in the middle of a double byte char.
>
> Gotcha. UTF-8 character sequences can be up to 6 bytes, FWIW.

You might need to decode character by character and see if the resulting bytes for the char exceed the accumulated line length. If it does then push a line feed and then append to the result bytes.

Lasso internally caches the UTF-8 converter objects making this not as inefficient as it would be otherwise.

local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adip\:SNOWMAN: iscing elit.'))
local(result = bytes, count = 0, max = 74)

with char in #teststring->eachCharacter
let b = #char->asBytes
let size = #b->size
do {
        if (#count + #size > #max) => {
                #result->append('\n')
                #count = 0
        else
                #count += #size
        }
        #result->append(#b)
}

#result

The SNOWMAN is a 3-byte UTF-8 character. In the result you’ll see it shifted to the next line when placed back two logical characters from the end.

-Kyle

> -Kyle
>
>> As illustrated here:
>> local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äåöäåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.’))
>>
>> (with sub in #teststring -> asBytes -> eachSub(74)
>> select #sub) -> join('\r\n ') -> asBytes
>>
>> ->
>> räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgåse äå�
>> �äåöäåöäåö adipiscing elit. Mauris consequat ornare lectus.
>>
>> The code need to take into consideration both that any line can’t be longer than 75 octets (bytes) and that any inserted line feed is placed between proper chars.
>>
>> HDB
>> Jolle
>> #############################################################
>> Attend the Lasso Developer Conference 2014!
>> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
>> http://www.lassosoft.com/LDC-newmarket-2014
>>
>> #############################################################
>>
>> This message is sent to you because you are subscribed to
>> the mailing list Lasso [hidden email]
>> Official list archives available at http://www.lassotalk.com
>> To unsubscribe, E-mail to: <[hidden email]>
>> Send administrative queries to  <[hidden email]>
>
> #############################################################
> Attend the Lasso Developer Conference 2014!
> October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
> http://www.lassosoft.com/LDC-newmarket-2014
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>

#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Folding lines on "octets"

Jolle Carlestam-2
16 jul 2014 kl. 02:28 skrev Kyle Jessup <[hidden email]>:

> You might need to decode character by character and see if the resulting bytes for the char exceed the accumulated line length. If it does then push a line feed and then append to the result bytes.
>
> Lasso internally caches the UTF-8 converter objects making this not as inefficient as it would be otherwise.
>
> local(teststring = string('räksmörgås Lorem ipsum dolor sit amet, consectetur räksmörgås adip\:SNOWMAN: iscing elit.'))
> local(result = bytes, count = 0, max = 74)
>
> with char in #teststring->eachCharacter
> let b = #char->asBytes
> let size = #b->size
> do {
> if (#count + #size > #max) => {
> #result->append('\n')
> #count = 0
> else
> #count += #size
> }
> #result->append(#b)
> }
>
> #result
>
> The SNOWMAN is a 3-byte UTF-8 character. In the result you’ll see it shifted to the next line when placed back two logical characters from the end.
>
> -Kyle

Cool!
Thanks!

HDB
Jolle
#############################################################
Attend the Lasso Developer Conference 2014!
October 1-3, 2014 at Treefrog HQ, Newmarket, Ontario, Canada
http://www.lassosoft.com/LDC-newmarket-2014

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>