String RegEx Check

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

String RegEx Check

Marc Pope-2
Never been good at RegEx.

Goal is to test a string to make sure it only has legal characters for a simple filename.  (A-Z, a-z 0-9 and  _ - dashes preferable)

I know that \w works for a-z A-Z and 0-9 and _  but I'd want dashes as valid too. No periods. No spaces. No other random characters.

No idea how to test a variable for the condition in Lasso either.

Thanks
Marc Pope


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Carl Ketterling
The RegEx for the string (I think) would be this:

"^[a-zA-Z0-9_-]*$”

From what I read, the hyphen doesn’t have to be escaped if it’s the last item in the list.

I didn’t understand how to put that in Lasso after looking at the Language Reference pages.

Carl



> On Nov 9, 2017, at 11:19 AM, Marc Pope <[hidden email]> wrote:
>
> Never been good at RegEx.
>
> Goal is to test a string to make sure it only has legal characters for a simple filename.  (A-Z, a-z 0-9 and  _ - dashes preferable)
>
> I know that \w works for a-z A-Z and 0-9 and _  but I'd want dashes as valid too. No periods. No spaces. No other random characters.
>
> No idea how to test a variable for the condition in Lasso either.
>
> Thanks
> Marc Pope
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>  the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Bil Corry-3
In reply to this post by Marc Pope-2
To test:

var('filename') = 'file_123';
if(string_findregexp(#file, -find='[^\\w\\-]')->size > 0);
  'Warning: illegal characters used';
else;
  'All good!';
/if;

The other option is to just strip out the bad characters, and replace them
with dashes:

var('filename') = 'file_123 ..{}\\\ will this work?';
$filename = string_replaceregexp(#file, -find='[^\\w\\-]', -replace='-');
'The new filename is '+$filename;


- Bil

On Thu, Nov 9, 2017 at 10:19 AM, Marc Pope <[hidden email]> wrote:

> Never been good at RegEx.
>
> Goal is to test a string to make sure it only has legal characters for a
> simple filename.  (A-Z, a-z 0-9 and  _ - dashes preferable)
>
> I know that \w works for a-z A-Z and 0-9 and _  but I'd want dashes as
> valid too. No periods. No spaces. No other random characters.
>
> No idea how to test a variable for the condition in Lasso either.
>
> Thanks
> Marc Pope
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
Na, Bil. A bit hasty on your copy paste exercise? That example will not work.

HDB
Jolle

Sent from a mobile device. Any anomalies is due to Autocorrect.

> 9 nov. 2017 kl. 19:09 skrev Bil Corry <[hidden email]>:
>
> var('filename') = 'file_123';
> if(string_findregexp(#file, -find='[^\\w\\-]')->size > 0);
>  'Warning: illegal characters used';
> else;
>  'All good!';
> /if;


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Bil Corry-3
I don't have Lasso setup on my new desktop to test, but it should be close.


- Bil

On Thu, Nov 9, 2017 at 11:28 AM, Jolle Carlestam <[hidden email]>
wrote:

> Na, Bil. A bit hasty on your copy paste exercise? That example will not
> work.
>
> HDB
> Jolle
>
> Sent from a mobile device. Any anomalies is due to Autocorrect.
>
> > 9 nov. 2017 kl. 19:09 skrev Bil Corry <[hidden email]>:
> >
> > var('filename') = 'file_123';
> > if(string_findregexp(#file, -find='[^\\w\\-]')->size > 0);
> >  'Warning: illegal characters used';
> > else;
> >  'All good!';
> > /if;
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
Lasso is not very understanding when you create a variable called filename and then call a local called file. But besides that I think your code provides the answer for the question.

Now as a side note I’m of the opinion that the question was wrong. Or at least not useful in a real world situation for us non English speaking.
Files can have all sorts of names nowadays. The time when we were restricted to 8 ASCII chars are long gone.
Some examples that would all fail the check asked for:
Bröllopsminnen
Günters Österreich reise
Åshöjdens räksmörgåsar

No file system would reject any of these names. If fact I think there are rather few characters that are rejected. I can think of : and / or \. What else would fail?

Would be nice to have a modern solution that works for all languages and todays operative systems.

HDB
Jolle

Sent from a mobile device. Any anomalies is due to Autocorrect.

> 9 nov. 2017 kl. 19:32 skrev Bil Corry <[hidden email]>:
>
> I don't have Lasso setup on my new desktop to test, but it should be close.


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Bil Corry-3
> No file system would reject any of these names. If fact I think there are
rather few characters that are rejected. I can think of : and / or \. What
else would fail?

Depends on the file system and some have length restrictions as well.

Here's the first hit from Google on the topic:

https://kb.acronis.com/content/39790


- Bil


On Thu, Nov 9, 2017 at 12:41 PM, Jolle Carlestam <[hidden email]>
wrote:

> Lasso is not very understanding when you create a variable called filename
> and then call a local called file. But besides that I think your code
> provides the answer for the question.
>
> Now as a side note I’m of the opinion that the question was wrong. Or at
> least not useful in a real world situation for us non English speaking.
> Files can have all sorts of names nowadays. The time when we were
> restricted to 8 ASCII chars are long gone.
> Some examples that would all fail the check asked for:
> Bröllopsminnen
> Günters Österreich reise
> Åshöjdens räksmörgåsar
>
> No file system would reject any of these names. If fact I think there are
> rather few characters that are rejected. I can think of : and / or \. What
> else would fail?
>
> Would be nice to have a modern solution that works for all languages and
> todays operative systems.
>
> HDB
> Jolle
>
> Sent from a mobile device. Any anomalies is due to Autocorrect.
>
> > 9 nov. 2017 kl. 19:32 skrev Bil Corry <[hidden email]>:
> >
> > I don't have Lasso setup on my new desktop to test, but it should be
> close.
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Marc Pope-2
In reply to this post by Jolle Carlestam-2
Good point, I decided to just use the name as a Vanity name, but use internal IDs to name the file. This way, you can use all your funny characters and I don’t care :) WIN WIN.

Probably more secure this way regardless for this solution.

-Marc


> On Nov 9, 2017, at 2:41 PM, Jolle Carlestam <[hidden email]> wrote:
>
> Would be nice to have a modern solution that works for all languages and todays operative systems.


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
In reply to this post by Bil Corry-3
Interesting read. But the fact that it see the need to bring up OS 9 so prominently indicate that it’s dated. I have plenty of users demanding support for Swedish. None requiring support for OS 9.

Or Windows 3 for that matter.

I would venture that we can safely assume that file name length is no longer a practical issue. And that most chars are supported on all used file systems. I’m thinking even 🏵🏆🎲😉.txt works as a file name. (Not tested)
Thus we need a filter that only weed out the few chars that won’t work and let us keep the rest. Not forces us to choose limiting file names belonging in the former millennium.

HDB
Jolle

Sent from a mobile device. Any anomalies is due to Autocorrect.

> 9 nov. 2017 kl. 21:05 skrev Bil Corry <[hidden email]>:
>
> Here's the first hit from Google on the topic:
>
> https://kb.acronis.com/content/39790
>


#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Bil Corry-3
I guess it depends on how the filenames are used.  I'd hate to have to type
emojis in order to move a file from one directory to another on the command
line.  But to your point, yes, a world where anything can be used for a
filename would be good.  Likely the future will get rid of filenames
entirely.


- Bil


On Thu, Nov 9, 2017 at 2:26 PM, Jolle Carlestam <[hidden email]> wrote:

> Interesting read. But the fact that it see the need to bring up OS 9 so
> prominently indicate that it’s dated. I have plenty of users demanding
> support for Swedish. None requiring support for OS 9.
>
> Or Windows 3 for that matter.
>
> I would venture that we can safely assume that file name length is no
> longer a practical issue. And that most chars are supported on all used
> file systems. I’m thinking even 🏵🏆🎲😉.txt works as a file name. (Not
> tested)
> Thus we need a filter that only weed out the few chars that won’t work and
> let us keep the rest. Not forces us to choose limiting file names belonging
> in the former millennium.
>
> HDB
> Jolle
>
> Sent from a mobile device. Any anomalies is due to Autocorrect.
>
> > 9 nov. 2017 kl. 21:05 skrev Bil Corry <[hidden email]>:
> >
> > Here's the first hit from Google on the topic:
> >
> > https://kb.acronis.com/content/39790
> >
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
9 nov. 2017 kl. 23:18 skrev Bil Corry <[hidden email]>:
>
> I guess it depends on how the filenames are used.  I'd hate to have to type
> emojis in order to move a file from one directory to another on the command
> line.

This intrigued me. I have never used emojis in a file name. And never seen a file name containing them. But I now saw the need for a test. I created a file 🏵🏆🎲😉.txt. My OS MacOS 10.12.6 had no problem displaying the filename, nor moving it around using Finder. Could also open the file. Over to the CLI. Tried the following commands:
jolle:Desktop jolle$ mv test/🏵🏆🎲😉.txt ./
jolle:Desktop jolle$ vi 🏵🏆🎲😉.txt

Both worked. Now, admittedly I did not try to type the file name. Have no idea how. First command was executed using tab expansion. Second a copy/paste operation. But that’s not the point. The key here is that the OS readily provided support for the chars in the file name. And since this is not an exercise about file names that I like, or Bil uses. But about supporting file names that any user of our web services could come up with, it could be relevant to allow our service to accept any construction that the OS allows. Admittedly I think it’s far fetched and not likely that we’ll see emoji names.
In fact I know that Lasso, surprisingly, has problems with emojis. Lasso, it turns out, can’t send emoji chars to Mysql out of the box. You have to use utf8mb4 in your Mysql tables and do some specific preparations of your Lasso side Mysql interaction in order to handle emojis. In my projects I have decided that my users are not allowed to provide emojis in content and I strip them out before sending them to Mysql. Using a regex I think Bil provided once. Thanks, Bil!

Ah, back to the track. While emoji support would be a stretch, Swedish, German and French characters in file names are very common and should be supported by our solutions. As should Arabic, Chinese and other common languages. Where’s the regex for that?

>  But to your point, yes, a world where anything can be used for a
> filename would be good.  Likely the future will get rid of filenames
> entirely.

Interesting point and you’re probably right. We are not at that place yet, though.


BTW, I did some Linux testing as well. On a Centos 6 server I tried emoji file names. The experience was not as pleasing as MacOS provided. But all operations worked. I could create the file typing vi 🏵🏆🎲😉.txt. No issues adding content and saving it in vim. The visible feedback was lacking. An ls -al resulted in
-rw-r--r--  1 root   root       25 Nov 10 06:29 ????????????????.txt
And when I wanted to remove the file, typing rm 🏵🏆🎲😉.txt it asked me
rm: remove regular file `\360\237\217\265\360\237\217\206\360\237\216\262\360\237\230\211.txt’?
I’m not impressed.
But, the important lesson was that it worked and that the OS supported the weird file name.

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
In reply to this post by Marc Pope-2
9 nov. 2017 kl. 21:40 skrev Marc Pope <[hidden email]>:
>
> Good point, I decided to just use the name as a Vanity name, but use internal IDs to name the file. This way, you can use all your funny characters and I don’t care :) WIN WIN.
>
> Probably more secure this way regardless for this solution.

Actually, this is what I do. Store the files using UUIDs then keep the original file name in the DB. When a file is requested I use Lasso to verify that the user has permission to access the file and then send it using file_serve supplying the original file name as name.

For files like images that are called by the browser as part of the site experience I use a combination. Paths will look like /image/LK65SH/räksmörgås.jpg. When the image is requested Lasso uses the LK65SH part to locate the image using a DB call. The latter part, räksmörgås.jpg, is for the users benefit only. So that the path will look nice.

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
In reply to this post by Jolle Carlestam-2
10 nov. 2017 kl. 06:43 skrev Jolle Carlestam <[hidden email]>:
>
> Ah, back to the track. While emoji support would be a stretch, Swedish, German and French characters in file names are very common and should be supported by our solutions. As should Arabic, Chinese and other common languages. Where’s the regex for that?

Well, couldn’t resist. I’ve made a Lasso 9 method that will provide safe file names. My attempt looks like this:

define safe_filename(
        filename::string,
        -replacechar::string = '-',
        -clearemoji::boolean = false,
        -noleadingperiod::boolean = false,
        -allowed_length::integer = 255
) => {

        local(_filename = string(#filename))

        // spaces at start and end not allowed
        #_filename->trim
        // periods at end of filename not allowed
        #_filename->removetrailing('.')
        // period at start of filename is allowed but makes the file invisible on some OS
        #noleadingperiod ? #_filename->removeleading('.')

        // emojis can cause issues when communicating with Mysql
        #clearemoji ? #_filename->replace(regexp(`[\x{10000}-\x{10ffff}]`, #replacechar))
        // illegal chars in some or all file systems
        #_filename->replace(regexp(`[\\/:]`, #replacechar))

        while (#_filename->length > #allowed_length) => {
                local(
                        nameparts = #_filename->split('.'),
                        suffix = #nameparts->last
                )
                #suffix->length > #allowed_length ? #suffix = #suffix->substring(1, #allowed_length)
                #nameparts->remove // gets rid of the suffix
                #_filename = #nameparts->join('.')->substring(1, max(1, #allowed_length - (#suffix->length + 1)))
                #_filename->length > 0 ? #_filename->append('.')
                #_filename->append(#suffix)
        }

        return #_filename
}

I will put it on gist shortly.


This is the testing I tried.

local(
        filenames = array(
                ' Ansgar was a woman.txt',
                'Räksmörgåsen.jpeg ',
                'emoji-🏵🏆🎲😉.txt.',
                'Long file name that is so long no one would ever contemplate using it but we still need it for our testing purposes. Oh, heavy sigh, Still only half way to a name that would be too long for our test. But getting towards the end slowly and there it is! I have now passed the limit.txt',
                'Long file name with no suffix that is so long no one would ever contemplate using it but we still need it for our testing purposes - Oh, heavy sigh, Still only half way to a name that would be too long for our test - But getting towards the end slowly and there it is! I have now passed the limit'

        ),
        suspectchars = '!"#€%&/()-_.:,;<>§°¨=@©£$∞§|[]≈±´•¡”¥¢‰¶\\{}≠¿’?`´^*`•Ω鮆µ"',
        output = string
)

with char in #suspectchars->split('') do {
        #filenames->insert('char_' + #char + '.txt')
}

with filename in #filenames do {

        protect => {
                handle_error => {
                        #output->append('Error writing file ' + #filename + '\n' + error_msg + '\n')
                }

                local(safefilename = safe_filename(#filename))

                local(file = file('files/' + #safefilename))
                #file->dowithclose => {
                        #file->openwrite
                        #file->writestring(date + '\nRäksmörgås\n' + #filename)
                        #output->append('Success writing file ' + #filename + (#filename != #safefilename ? ' using ' + #safefilename) + '\n')
                }
        }
}
'<pre>'
#output
'</pre>’


On MacOS all test files succeeded. On Centos 6 the file without a suffix failed.

Would be interesting to see how it works on Windows. Also note that I did not try all possible chars, nor Arabic, Chinese or other alphabets.

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Jolle Carlestam-2
10 nov. 2017 kl. 08:46 skrev Jolle Carlestam <[hidden email]>:
>
> I will put it on gist shortly.

Here it is:
https://gist.github.com/jolle-c/154d87324d40cd1c45f32a635a3c9350

HDB
Jolle

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: String RegEx Check

Johan Solve
In reply to this post by Jolle Carlestam-2
Unfortunately, the differences in Unicode representation and normalization
creates a lot of headache on macOS with HFS+.

HFS+ represent unicode characters as UTF16 in fully decomposed form
(Unicode Normalization Form D - NFD), which means that an ä is represented
by the characters a and ¨ in sequence.

So it’s perfectly reasonable to asciify uploaded filenames, or as you later
suggest store the filenames in database instead.

2017-11-09 22:26 GMT+01:00 Jolle Carlestam <[hidden email]>:

> Interesting read. But the fact that it see the need to bring up OS 9 so
> prominently indicate that it’s dated. I have plenty of users demanding
> support for Swedish. None requiring support for OS 9.
>
> Or Windows 3 for that matter.
>
> I would venture that we can safely assume that file name length is no
> longer a practical issue. And that most chars are supported on all used
> file systems. I’m thinking even 🏵🏆🎲😉.txt works as a file name. (Not
> tested)
> Thus we need a filter that only weed out the few chars that won’t work and
> let us keep the rest. Not forces us to choose limiting file names belonging
> in the former millennium.
>
> HDB
> Jolle
>
> Sent from a mobile device. Any anomalies is due to Autocorrect.
>
> > 9 nov. 2017 kl. 21:05 skrev Bil Corry <[hidden email]>:
> >
> > Here's the first hit from Google on the topic:
> >
> > https://kb.acronis.com/content/39790
> >
>
>
> #############################################################
>
> This message is sent to you because you are subscribed to
>   the mailing list Lasso [hidden email]
> Official list archives available at http://www.lassotalk.com
> To unsubscribe, E-mail to: <[hidden email]>
> Send administrative queries to  <[hidden email]>
>



--
Mvh
Johan Sölve
____________________________________
Montania System AB
Halmstad   Stockholm
http://www.montania.se

Johan Sölve
Mobil +46 709-51 55 70
[hidden email]

Kristinebergsvägen 17, S-302 41 Halmstad, Sweden
Telefon +46 35-136800 |  Fax +46 35-136801

#############################################################

This message is sent to you because you are subscribed to
  the mailing list Lasso [hidden email]
Official list archives available at http://www.lassotalk.com
To unsubscribe, E-mail to: <[hidden email]>
Send administrative queries to  <[hidden email]>