Have a suggestion?

Click here to suggest a blog item.

Newsletters Archive

Catch up with DonationCoder by browsing our past newsletters, which collect the most interesting discussions on our site: here.

Editorial Integrity

DonationCoder does not accept paid promotions. We have a strict policy of not accepting gifts of any kind in exchange for placing content in our blogs or newsletters, or on our forum. The content and recommendations you see on our site reflect our genuine personal interests and nothing more.


Latest News

July 2, 2024
Server Migrations Coming

  • Donationcoder server migration is slowly proceeding, expect some hiccups as we get all our ducks in a row..

July 19, 2022
Software Update

Jan 3, 2022
Event Results

May 13, 2020
Software Updates

Mar 24, 2020
Mini Newsletter

Dec 30, 2019
Software Updates

Jan 22, 2020
Software Updates

Jan 12, 2020
Newsletter

Jan 3, 2020
Event Results

Jan 2, 2020
Software Updates

Dec 30, 2019
Software Updates

April 27, 2019
Software Updates

Feb 26, 2019
Software Updates

Feb 23, 2019
Software Updates

Feb 14, 2019
Software Updates

Jan 6, 2019
Event Results

Dec 2, 2018
Software Updates

Nov 13, 2018
Software Releases

July 30, 2018
Software Updates

June 24, 2018
Software Updates

June 6, 2018
Software Updates

Apr 2, 2018
Fundraiser Celebration

Apr 2, 2018
Software Updates

Feb 24, 2018
Software Updates

Jan 14, 2018
Major Site News

Jan 10, 2018
Event Results

Latest Forum Posts

Our daily Blog

This page spotlights the most interesting posts collected from our forum every day.

You are viewing a specific blog item. Click here to return to the main blog page.

Mini-review: LineByter - Find and Extract Patterns (emails,etc) from Text Files

linebyter2.png
LineByter is a utility designed to find and extract patterns from text files.

It's a brand new (free) program coded by DonationCoder member Carl Danley (CodeByter) and released today.

It includes some unique features like duplicate removal, the ability to specify multiple match and reject patterns, and the ability to save and load profiles, that make it ideal for doing repeated things like extracting emails or urls from text files.

Motivation for the program

When we send out the DonationCoder mailing list, a certain number of the newsletter emails bounce back each month as undeliverable.  I use phplist to manage the web mailing list but lately what I've been doing is exporting these bounced emails from my email program and running an email extraction utility on the exported email to get a list of email addresses from these emails, and then feeding them into a script that turns off email notification for those users on the forum whose emails can be found.

In the past i've been using a now-discontinued utility designed specifically to extract emails.  But it's less than ideal.  It's a big clunky, it sometimes finds things that aren't emails, and sometimes misses real emails.  It also has a bad user interface and doesn't remove duplicates.  After i would run this utility i would bring the output file into a text editor, sort and remove duplicates, and then go through and remove certain emails, like those that are really donationcoder.com addresses and a a few known fake email address patterns that seem to show up regularly.

SO that's why I have been wanting for a while a little utility that is better at extracting emails and doing some of the things automatically that i have been doing manually.  Of course I could have written a little perl or python script for it, but i am a big fan of custom gui tools for such things.

LineByter is the program that emerged from my discussions with Carl about this idea.  It's actually a much more general purpose program that can extract and reject all kinds of regular expression patterns, BUT it's also designed to be really easy to use and is focused specifically enough on the general workflow that i described above so that it's a real joy to use for this kind of stuff.


Features

Some key features of the program:
  • You can drag and drop as many files to scan as you want.
  • Nice progress bar so you can see how much more time it's going to take.
  • Supports preset library of regular expressions so you can easily just select common patterns and add your own presets -- this is super important for letting you quickly reuse patterns and makes it suitable even for those who don't understand regular expression syntax.
  • Lets you specify a list of multiple patterns that are being searched for and how to extract the data you want from these patterns.
  • Lets you specify a list of additional patterns which should be rejected even if they match the first list (ideal if you want to find and extract all email patterns except those with certain properties).
  • Shows a nice complete report of why each pattern was found and/or rejected.
  • Automatically removes duplicates.
  • Produces a final list of results in text form that can be copied to clipboard or saved to file.
  • Can save and load profiles so you can reuse configuration settings for common jobs you perform.

Click here to continue reading the full minireview now..



Share on Facebook