193. Valid Phone Numbers

Description

Given a text file file.txt that contains list of phone numbers (one per line), write a one liner bash script to print all valid phone numbers.

You may assume that a valid phone number must appear in one of the following two formats: (xxx) xxx-xxxx or xxx-xxx-xxxx. (x means a digit)

You may also assume each line in the text file must not contain leading or trailing white spaces.

Example:

Assume that file.txt has the following content:

987-123-4567
123 456 7890
(123) 456-7890

Your script should output the following valid phone numbers:

987-123-4567
(123) 456-7890

My Solution

Source Code

1
2
# Read from the file file.txt and output all valid phone numbers to stdout.
grep -E "^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$" file.txt

Analysis

So this is probably going to sound terrible but I didn't actually know there were different implementations of POSIX regular expressions. I learned today there are Basic Regular Expressions and Extended Regular Expressions. ERE are what I've been using my entire programming-related life. Some of the differences between BRE and ERE include {} and () needing to be escaped in BRE and ERE having +, ?, and |. When you use grep, by default it takes a BRE. I've used grep before but only with absolutely basic regular expressions, like just matching a specific word. To use grep with a ERE, you would use grep -E.

Anyway, the regular expression I came up is not that complicated. Basically I started with the simplest test case, a number formatted xxx-xxx-xxxx and worked my way up.

^[0-9]{3}-[0-9]{3}-[0-9]{4}$ matches xxx-xxx-xxxx

/(?[0-9]{3}/)?-[0-9]{3}-[0-9]{4} will also match (xxx)-xxx-xxxx which isn't actually valid but it's a start.

Area codes that are wrapped in () must be followed by a space. Likewise area codes not wrapped in () must be followed by hyphen. To match this pattern, I used:

(\([0-9]{3}\) |[0-9]{3}-)

Putting it all together:

^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$

And that's about it.