When not to use a regex August 13, 2017 on Drew DeVault's blog

The other day, I saw Learn regex the easy way. This is a great resource, but I felt the need to pen a post explaining that regexes are usually not the right approach.

Let’s do a little exercise. I googled “URL regex” and here’s the first Stack Overflow result:

https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)

source

This is a bad regex. Here are some valid URLs that this regex fails to match:

Here are some invalid URLs the regex is fine with:

This answer has been revised 9 times on Stack Overflow, and this is the best they could come up with. Go back and read the regex. Can you tell where each of these bugs are? How long did it take you? If you received a bug report in your application because one of these URLs was handled incorrectly, do you understand this regex well enough to fix it? If your application has a URL regex, go find it and see how it fares with these tests.

Complicated regexes are opaque, unmaintainable, and often wrong. The correct approach to validating a URL is as follows:

from urllib.parse import urlparse

def is_url_valid(url):
    try:
        urlparse(url)
        return True
    except:
        return False

A regex is useful for validating simple patterns and for finding patterns in text. For anything beyond that it’s almost certainly a terrible choice. Say you want to…

validate an email address: try to send an email to it!

validate password strength requirements: estimate the complexity with zxcvbn!

validate a date: use your standard library! datetime.datetime.strptime

validate a credit card number: run the Luhn algorithm on it!

validate a social security number: alright, use a regex. But don’t expect the number to be assigned to someone until you ask the Social Security Administration about it!

Get the picture?

Have a comment on one of my posts? Start a discussion in my public inbox by sending an email to ~sircmpwn/public-inbox@lists.sr.ht [mailing list etiquette]

Articles from blogs I read Generated by openring

Status update, August 2020

Hi! Regardless of the intense heat I’ve been exposed to this last month, I’ve still been able to get some stuff done (although having to move out to another room which isn’t right under the roof). I’ve worked a lot on IRC-related projects. I’ve added a znc-i…

via emersion 2020-08-19 00:00:00 +0200 +0200

What's cooking on Sourcehut? August 2020

Another month passes and we find ourselves writing (or reading) this status update on a quiet, rainy Sunday morning. Today our userbase numbers 16,683 members strong, up 580 from last month. Please extend a kind welcome to our new colleagues! Thanks for read…

via Blogs on Sourcehut 2020-08-16 00:00:00 +0000 +0000

Go 1.15 is released

Today the Go team is very happy to announce the release of Go 1.15. You can get it from the download page. Some of the highlights include: Substantial improvements to the Go linker Improved allocation for small objects at high core coun…

via The Go Programming Language Blog 2020-08-11 11:00:00 +0000 +0000

North Pacific Logbook

The passage from Japan (Shimoda) to Canada (Victoria) took 51 days, and it was the hardest thing we've ever done. We decided to keep a logbook, to better remember it and so it can help others who wish to make this trip.Continue Reading

via Hundred Rabbits 2020-07-31 00:00:00 +0000 GMT