Sed to extract email addresses from file


sed -e "s/^.\*<\(.*\)>.*$/\1/" /file_with_emails | sort | \\
uniq | tr "[:upper:]" "[:lower:]"

PS: Filip, he’s terribly smart, I guess the only one to read my blog, pointed out the following one-liner:

sed -nr "s/.*<([A-Z0-9._%-+]+@[A-Z0-9.-]+\.[A-Z]{2,4})>.*/\L\1/pi" \\
file_with_emails | sort -u

Comments:

Filip - Jun 4, 2012

Hi. This one does the lowercase conversion immediately and lists only e-mails in a valid format.

sed -nr "s/.*.*/\L\1/pi" file_with_emails | sort -u

Filip - Jun 4, 2012

Sorry, WP isn’t friend of my sed oneliner. :-)

David Hrbáč - Jun 6, 2012

Updated the post with your great one-liner. Thanks!

Max - Sep 0, 2012

Any idea on how to extract data from an email source (header and body)? I need an output of only: Date: From: To: Subject: BODY (the body is separated from the headers by the first CRLF, then i want all the rest to be included in BODY) I think is simple using sed for an expert, but i don’t know anything about it and it seems quite difficult to start quickly. Also suggestion for a good and simple guide will be appreciated. The source data will be like: Return-Path:… Received:… …… …… From: … Date: … Subject: … To: … … … BODY

Ferran - May 5, 2014

I’ve tried your regex with about 7k of mails and they bring up about 1k of e-mail addresses; this is the regex I end up using:

sed -e 's/.*<\([^:]\+\)>\s\*$/\L\1/' @Max, use mail libs; a python example: $ cat extract_froms.py #!/usr/bin/python2 import sys import email files = sys.argv\[1:\] for fn in files: msg = email.message_from_file(open(fn)) print msg['from'].replace('\n','')
sed