Creating a regex for finding credit card numbers with grep

18 Apr 2008

Posted by acrollet

Ugly regex:

grep '\(^\|[^0-9]\)\{1\}\([345]\{1\}[0-9]\{3\}\|6011\)\{1\}[-]\?[0-9]\{4\}[-]\?\
       [0-9]\{2\}[-]\?[0-9]\{2\}-\?[0-9]\{1,4\}\($\|[^0-9]\)\{1\}' filename

Lately, I've been working with the security team where I'm employed to catch people storing information they shouldn't be on our database server. (SSNs, credit card numbers, etc.) This involved dumping all our databases into a flat file (about a gig of text) and doing some mining. I was given a pre-built regex, but it didn't work with grep, I'm enough of a command-line geek that I'd rather do things 'my way' than just write a perl script or something. So, I had to make my own regex to find cc numbers, because I didn't find anything effective in a quick search online. This brings to mind a hoary old chestnut of a quote - over-used, but still very true:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." 
Now they have two problems. 

-Jamie Zawinski, in comp.lang.emacs

All that notwithstanding, I found a few resources that came in handy for this project.

  • This php tutorial has a very good description of a basic spec for valid cc numbers.
  • This page has more info on valid numbers, and test numbers for all the major companies.
  • Txt2regex is a handy command-line utility that will ask for information on what you want to match, and build regexes of many different formats step-by-step. It didn't do everything for me, but it can be a handy way to get close to what you want without having to dive into the man pages...
  • Handy article on gnu grep from linux.com - more info just below.

Once you think you've got your regex built, gnu grep has some options not available in standard versions of grep that are very handy for debugging. The -o switch will output only the matched part of the string, ex:

The --color switch will highlight matches - very handy when you're searching through dense SQL dumps!

Comments

I am trying to understand what the (^\|[^0-9]\)\{1\} and \($\|[^0-9]\)\{1\} are for in the expression, The ^ $ seem to be limiting since it must be a credit card on it on line, i not sure what the 0-9 at the start and end are for.

So the following would seem to be more useful

echo "5105105105105100" |grep -o '\([345]\{1\}[0-9]\{3\}\|6011\)\{1\}[ -]\?[0-9]\{4\}[ -]\?[0-9]\{2\}[-]\?[0-9]\{2\}[ -]\?[0-9]\{1,4\}'


Add new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.