Sublime Text 2: Find and Replace with Regular Expressions for Idiots

Preface

Hello there! I am an HTML and CSS expert that also hacks on server-side (PHP) and client-side (JavaScript) scripting languages. I am surprisingly productive at working with these elements even though I do not have a degree in Computer Science.

I like to use Sublime Text 2 – or whatever version – to work on my code. I want to work faster and Sublime Text’s multiple-cursor/multiple-file editing and fancy search tools are like training wheels for a person that never attempted to learn how to use Vi or Emacs.

With all of that out of the way, I’d like to say that REGULAR EXPRESSIONS ARE AWESOME. However, everyone online that tries to explain how to use them seems to think that they are talking to people that already know how to use them. It’s baffling and frustrating when you don’t have the foundation in text editing skills that everyone online assumes is common knowledge. And it’s not like you have to be a wizard to use regular expressions. If you can understand a few basic mechanisms, the rest is just vocabulary.

Find and Replace Basics

Find and Replace is a simple mechanism available in any serious text editor and also word processors and design layout applications.

Find and Replace at its most basic has two fields:

  1. FIND – A set and arrangement of text characters that you are specifically looking for within a defined range of text or a document.
  2. REPLACE – A set and arrangement of text characters with which you want to overwrite the contents of the FIND field.

In Sublime Text 2 there are four buttons associated with the Find and Replace panel:

  1. FIND – Visually highlight the first instance of text that matches the contents of the FIND field.
  2. FIND ALL – Visually highlight all instances of text that match the contents of the FIND field.
  3. REPLACE – Overwrite the first instance of text that matches the contents of the FIND field with the contents of the REPLACE field.
  4. REPLACE ALL – Overwrite all instances of text that match the contents of the FIND field with the contents of the REPLACE field.

In general, I only use REPLACE ALL since Sublime Text 2 pretty much does the FIND ALL functionality automatically as you type into the FIND field.

To bring up the Find and Replace panel, press CMD + Option + F or go the the menu > Find > Replace.

I’m not going to give an example of simple text find and replace. If you understand how CSS styles apply to specific HTML elements by way of element or class names and varying levels of inheritance, you most likely grok how basic find and replace works.

Employing Regular Expressions Within Find and Replace

What I am primarily interested in is leveraging Regular Expressions within the Find and Replace mechanism to achieve magical, time-saving actions.

Regular Expressions make it possible to automate what would otherwise be grueling manual-insertion tasks. For example, you need to convert tabulated data from a text file into tabulated data in an HTML table. Usually the only thing separating the values are varying numbers of spaces. Like this:

    1   478 John Doe              48 M      6:20 
    2   472 Eddie Murphy           17 M        6:29 
    3   440 Indiana Jones      49 M       6:46

Basic Find and Replace would work fine in this scenario if a single, unique character were used to separate the different values – like the comma in a comma-delimited/comma-separated file. But when all you have is varying numbers of spaces, a more sophisticated tool like Regular Expressions is needed.

Actually, we could insert the table row and initial table data tags by leveraging the invisible line-break character in the data above. To get the necessary invisible characters use click-and-drag to select the line-break at the end of one line like this:

Screen Shot 2014-08-10 at 4.04.41 PM

and Copy/Paste that into the FIND field. Then, in the REPLACE field type:

</td>
</tr>
<tr>
    <td>

and click REPLACE ALL to get the following:

But after that, we’re looking at a lot of manual select-and-paste work.

Let’s first use Regular Expressions to isolate the first and last name values. They are unique in this file in that the are two words separated by only one single space.

Screen Shot 2014-08-10 at 5.04.17 PMBefore we start employing Regular Expressions in the Find and Replace panel, we need to enable Regular Expressions by clicking a button by that name, which is the left-most top button in the Find and Replace panel:

 

With the Regular Expressions mode activated, this is what I put in the FIND field to isolate that space in that particular location:

([a-z]) ([A-Z])

Which should make Sublime Text 2 look like this:

Screen Shot 2014-08-10 at 4.19.31 PM

So, that works in this instance! But why and how? Let’s break it down:

  • [ ]   Whatever expression is inside these brackets will match ONE character.
  • [a-z]   This expression means “ONE character that is any lowercase letter, a through z.”
  • [A-Z]   This expression means “ONE character that is any uppercase letter, A through Z.”
  • The “space” in between these two expressions is a literal “space” character.
  • ( )   Parenthesis surrounding an expression make a “group”. Groups can be referred to by variables. In this case we defined two groups. Without any effort on your part, these groups are numbered, starting with variable “1” on the left-most group and counting upward to the right. This will come in handy when we fill in the REPLACE field.

In summary: The FIND field includes regular expressions that identify any single lowercase letter followed by a space and then any single uppercase letter.

Now, in the REPLACE field I will type the following:

\1</td><td>\2

and after clicking REPLACE ALL will result in the following text:

Screen Shot 2014-08-10 at 4.39.15 PM

That also worked! But why and how? Let’s break it down:

  • \  is the “backslash” character, which “escapes” the character that follows it. I don’t completely understand this usage, but in the case of Sublime Text 2’s REPLACE field this means “leave whatever text that matched our regular expression in the group (identified by it’s variable number) where it was, do not replace”.
  • In this example, we are telling Sublime Text 2 to leave the single lowercase letter associated with the variable “1” where it is, followed by the literal text  “</td><td>” and then leave the single uppercase letter associated with the variable “2” where it is.

The next target is between the Last Name data and the number to the right of it. I will put the following text into the FIND field:

([a-z]) *([0-9])

This actually ends up working better than I had anticipated. It selects the desired text as well as another series of spaces between data points:

Screen Shot 2014-08-10 at 5.12.44 PM

Since the first character is clearly supposed to be “lowercase only,” why is this second space between data points matching our search? Let’s break it down!

  • [a-z]   This expression means “ONE character that is any lowercase letter, a through z.”
  • Turns out I had the  “Case sensitive” feature (right next to the Regular Expressions button) disabled and as a result ignoring the case-sensitive aspects of my regular expressions! 🙂 Happy accident. Just something to be aware of if you are ever seeing confusing results.
  •  *   A literal “space” followed by an asterisk. The asterisk directly following another character means “match 0 or more of the preceding character” so this is quite useful for selecting spaces between data points that are made up of varying numbers of spaces.
  • ([0-9])   This expression means “ONE character that is any integer 0 through 9 (zero through nine).”

Since this lack of case-sensitivity actually worked for me, I went ahead with the same text in the REPLACE field as before and produced this result:

Screen Shot 2014-08-10 at 5.20.07 PM

With those two examples I feel I’ve covered some very useful basics of Regular Expressions within the context of Sublime Text 2’s Find and Replace panel.

A good resource for learning more about regular expressions: http://www.zytrax.com/tech/web/regex.htm#simple

 

Discovering Clay Shirky via Mr. Alan Cooper Quoting Him On the Perils of Categorizing Things In Advance

I keep wanting to refer back to this truly insightful tweet from Alan Cooper. Twitter is not a great place to keep things that will be useful for a long time. It can be grueling work to find a specific old tweet. As a result, here is the content of his tweet about the challenges of categorizing things in advance for documentation purposes:

“categorizing things in advance forces the categorizer to take on 2 jobs that are quite hard: mind reading, and fortune telling.”

and here is a screenshot of the tweet as well:

Screen Shot 2014-03-30 at 10.34.43 AM

 Thank you Mr. Alan Cooper for this wonderful little statement. As a result of wanting to frequently bring this quote to people’s attention I wanted to post this on my site. And after assembling the above it occurred to me that HE had put this in quotes himself. I didn’t notice that before! This was apparently not an original thought of his own but something somebody else said that he was sharing.

Naturally, the next thing to do was to sick Google on the quote and see what popped up.

Maybe I’m not smart enough to be following Alan Cooper on Twitter since I totally missed that the above tweet was quoting Mr. Clay Shirky from some talks that he gave in 2005 collectively titled, “Ontology is Overrated: Categories, Links, and Tags”. I am guessing these “talks” are well known in certain circles.

I wanted to share this little revelation about discovering the work of Clay Shirky. This is how I have learned throughout my whole life. This is how I know what I know. Looks like I have some reading to do!

Follow these guys on Twitter: @MrAlanCooper and @CShirky

SimpleInvoices: Invoice Template “Nebraska”

As someone who picks up a little freelance here and there, it can be handy to have some software that helps manage invoices and estimates. Software that isn’t Microsoft Excel which, while it will do the work, isn’t great for this purpose. Fortunately I discovered SimpleInvoices, a free and open source web-based invoice management program.

I was a bit disappointed that the default invoice style for SimpleInvoices didn’t resemble the illustration on the SimpleInvoices homepage. And, after seeing that the HTML template for the page was entirely constructed in tables, I went about creating a new, more contemporary HTML invoice template that mimics that template illustrated on the homepage.

(At least, the HTML-based print preview didn’t look that way. Maybe the export to PDF is a different story, but that functionality isn’t available to me.)

Since a template needs a name, I named this invoice template after my home state for now. I’m sure it’s not perfect, but it’s well-suited to my needs. Let me know if there are things that could be improved. You can download the files here:

http://simanek.us/downloads/SimpleInvoices_template_Nebraska_v1.2.zip

Notes:

Installation

  1. Extract files from ZIP archive after downloading the file.
  2. Copy the folder titled “Nebraska” to /templates/invoices in your SimpleInvoices installation.
  3. Log in to your SimpleInvoices program and navigation to Settings > System Preferences and edit the “Default Invoice Template” and select “Nebraska” from the list.
  4. If you have not yet specified your own logo image, upload your logo graphic (for printed and PDF’d invoices I recommend creating your logo in vector art and saving as an SVG file for use with SimpleInvoices) and navigate to People > Billers and click EDIT next to you name. Under “Logo file” you should be able to select your logo graphic.*
  5. Test template by opening an estimate or invoice and clicking the “Print Preview” option. Use your browsers printing functionality to print the invoice or save the output as a PDF.

* In order to use SVG files you will need to edit the following SimpleInvoices file: /include/functions.php – Open in text editor and look for “getLogoList” function and change the following line:

$ext = array("jpg", "png", "jpeg", "gif");

to include “svg”

$ext = array("jpg", "png", "jpeg", "gif","svg");

Save the file and now you can use the SVG version of your logo to get a crisp printed logo or a resolution-independent logo in your PDF file.

Firefox Tricks

There are limitations to relying on printing from web browsers to generate PDFs. One of the big limitations is that in general web browsers don’t print background colors or images. In the case of this template, that affects the gray background in the column heads and the yellow highlight behind the grand total. Fortunately Firefox (there might be other browsers that do this as well) gives the option to enable the printing of background colors and images in the Print dialog options.

The other aspect of printing from web browsers that is problematic is the automatic “Pages 1 of 2” and “the title of this webpage” headers and footers on the resulting print out. Firefox also allows you to customize or even turn these off entirely in the print dialog options. You’ll have to do this to get a good, clean invoice.

Layout

Originally I had tried to accommodate window envelopes by strictly formatting the Biller and Customer information sections. But with snail mail on the decline as a method for delivering invoices, version 1.2 discards that strict positioning in order to create a more flexible, robust and attractive layout.

Set a Custom Starting Point for a YouTube Video

I’ve been embedding videos from a variety of services into web pages for several years now. At the most basic, these services will allow you to embed a smaller version of the parent video. YouTube gives you the ability to adjust the display size and video player color scheme as well as some other interface options. Hulu is unique (as far as I know) in providing users with an easy-to-use custom clipping interface, allowing you to embed only a part of a selected video. I’ve gotten so comfortable with that feature that I’ve been hoping YouTube would introduce something similar.

Well, there’s good news. It’s not as slick as Hulu, but YouTube does provide some options on their embedded player. Most importantly there’s an option to specify a custom starting point for any embedded video. Here’s the parameter:

start

You can use this parameter by adding it to the source URLs in the provided embed code. For example, if I wanted to start my embedded video at the 1:12 mark, the YouTube embed code would look like this:


<object width="656" height="517"><param name="movie" value="http://www.youtube.com/v/tqXJzZ_T8kg?fs=1&amp;hl=en_US&amp;rel=0&amp;start=72"></param><param name="allowFullScreen" value="true"></param><param name="allowscriptaccess" value="always"></param><embed src="http://www.youtube.com/v/tqXJzZ_T8kg?fs=1&amp;hl=en_US&amp;rel=0&amp;start=72" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="656" height="517"></embed></object>

And the result will be the following:

You’re welcome!

HTML Character Chart Update: Sound Recording Copyright Symbol

Sound Recording (Phonogram) Copyright Symbol

A friend was looking for the circle-P character to include on the jacket design for a musical recording. I was surprised to discover a glyph/character that I wasn’t aware of. We weren’t even sure what this symbol meant in relation to the copyright symbol. Turns out this symbol, the Phonogram Copyright Symbol or Sound Recording Copyright Symbol, protects the copyright of the sound recording itself as something separate from the written music and lyrics. Good things to know!

Regardless, this character can now be found in my ever-growing HTML Character Code tool. Enjoy!

HTML Character Chart Update: Polish Alphabet

While working on an upcoming new website for Gramps (Free Open Source Genealogical Research tool that I contribute to) I am learning the challenges of developing a multilingual international website. In working with some translations I discovered that my character chart did not include characters from the Polish alphabet!

Needless to say, my character chart now includes decimal and hexadecimal/Unicode references for the characters in the Polish alphabet.

HTML CHARACTER CODES

Designing Around WordPress

If you haven’t noticed, I’m doing something really stupid: I’m learning about WordPress Templates by way of making changes to my live site. It’s interesting how they built the Kubrick template. It’s probably very brilliant in its way of dealing with qualities of CSS. However, as an example to learn from, it has a lot of idiosyncrasies.

So far, I feel like this isn’t a bad night’s work in adapting an existing style from a ground-up site to a WordPress template. I hope the site remains to be usable during this transition period.

GRAMPS 3.0 Coming Soon!

I am dabbling in software development with the GRAMPS (Genealogical Research and Analysis Management Programming System) project. It is a genealogical database building program that is available for Linux.

I got involved with the developer group after struggling to make my genealogy site look great. The markup techniques were outdated or wrong and nothing had an ‘id’ or ‘class’. After recommending the change to the devs, they let me know that it wasn’t a top priority and recommended that if I wanted to improve the output, I would have to do it myself. This would involve working with Python and also looking stupid asking dumb questions about version control and compiling GRAMPS myself to check my work. With a little time and patience (as well as being unemployed for three months in Nashville with just my wife for company) I put myself to the task.

I have been updating the ‘Narrative Web’ plugin, written in Python, that exports your genealogical data into a web site. Mostly I just corrected and updated the XHTML markup that was present amidst the Python. I did hack up the code a little so that the navigation could be styled to indicate the active page or section.

My primary focus was to make the sites more accessible to CSS. In the process I created a few style themes to be distributed with the application. I am quite proud of them and excited to hear feedback from the user community once GRAMPS 3.0 is released. Following is a few screen shots of the site output as it was and the four primary styles that I developed so far.

GRAMPS 2.2 Narrative Web Plugin Output

‘Modern’

GRAMPS 2.2 Modern Style

‘Tranquil’

GRAMPS 2.2 Tranquil Style

GRAMPS 3.0 Narrative Web Plugin Output

‘Basic – Ash’

Basic comes in a variety of color schemes and is based on the original ‘Modern’ style. I hate using the word ‘modern’ outside of discussions of philosophy or fine art. The general public in the U.S. is convinced that it means ‘contemporary’. No doubt this is thanks to decades of marketing professionals trying to make their products sound impressive.

GRAMPS 3.0 Basic Style

‘Nebraska’

I named this after my home state. It was my original stylesheet for the new markup. For that reason a lot of id and class solutions in the markup came out of challenges created by this design. I wanted this design to look fresh and inviting while being very easy to read.

GRAMPS 3.0 Nebraska Style

‘Mainz’

Named so for Gutenberg’s birthplace, this design was created to show off the potential of the new markup. It’s a bit repetitious of me, but for some reason the vision of the website as a sheet of paper is very appealing. I started this one based on the name of one of the original GRAMPS styles: Certificate. The original style didn’t really look like a certificate, but this one does.

GRAMPS 3.0 Mainz Style

Default Print Style

This is probably the style that will be the most overlooked. With XHTML + CSS there is the potential for the browser to automatically switch stylesheets based on the media of representation. You can define one stylesheet as ‘screen’ and another as ‘print’ (there’s actually quite few different defined mediums according to the W3C, including ‘mobile’ and ‘projection’). GRAMPS is the first project where I could form a strong argument for using this feature to its full potential. Now anytime someone prints a page from a Narrative Web site, the output will be well designed for print with an emphasis on efficiency and legibility.

GRAMPS 3.0 Default Print Style

There’s still more work to do, but I wanted to get the word out on this great update to GRAMPS that’s just around the corner. Along with my relatively minor contributions, the other developers have been working very hard to make 3.0 an impressive and powerful update. If you are interested in trying it out, do keep in mind that in the open source world ‘.0’ means ‘submit final work to users and fix a lot of reported bugs’. So, if you are looking for a perfect application, I recommend waiting for GRAMPS 3.1. 😉

HTML Character Reference Chart Update

I have just finished updating my HTML Character Reference Chart. Along with updating invalid numerical references 129 through 159 to the valid decimal numbers there are a few new features:

  • different character sets in separate views
  • complete decimal and hexadecimal references
  • entity references for some characters.
  • New Complete Table Section displays all references
  • New Favorites Section displays only your favorites

HTMLChar1

This new version is built with XML and XSLT, making the process of updating information or creating new layouts very simple. I am learning the basics of both right now and it’s amazing how powerful and elegant these simple text files can be.

The Favorites section works with comma-separated URL values. You will have to edit your URL “manually” by adding the decimal, hexadecimal or alphabetical values to the end of the page’s URL. There is an example favorites set to help you get started.

BohemianAlps HTML Character Codes »

Internet Explorers 6, 5.5 and 5 on Linux

This is a pretty sweet and easy set up for web developers on Linux. Just download and double-click and IEs4Linux takes care of itself. It is running IE using Wine. Apparently this is the same old Internet Explorer application surrounded by some programs that allow it to run on Linux. They warn about security issues and not using it as your primary browser, but I honestly don’t know who’d think of doing that. Anybody that is using Linux and goes to the trouble of getting this to run IE on it isn’t going to use it for recreation.

How does it work? Looks good to me. For troubleshooting anyway. The fonts are weird but that’s expected. IE 6 gives me lots of bold type and the IE 5s don’t have that problem. They’re just not anti-aliased. These IEs also run as fast as they would on Windows as far as I can tell. Actually, on my Pentium II 266MHz laptop, they load faster than Firefox.

How do you get it? Well, no need to comprehend Wine, thank god. (Has anybody else noticed that there is no explanation on their site to even attempt installing a Windows app on Linux?) Just go to IEs4Linux and follow the instructions on that page. You will have to install Wine and Cabextract and then finally run their IEs4Linux file. You will be in IE heaven soon! Now, if only they had a set up package for OSX.