Tuesday, January 24, 2012

IRC message Regex

So I'm in the process of making an irc bot, and one of the problems I always seem to have is parsing the message. If you don't know what parsing means, the simplest explanation is changing the message, in this case breaking it down into some form of an object that organizes the data for it.

Normally I would split it and then check various information from it being split then check all the split parameters and try to organize it, however when searching recently I found an easier way. Now keep in mind, this is client side parsing. I found the original on Caleb Delnay, so original credit goes here, but I wanted to expand upon it and convert it to work with the other subtle changes in regex across other languages.

*** There were further improvements added by someone else who I found thanks to some referrals in the stats thanks to a lovely accreditation to this post. Check out more info at mybuddymichael.com. I'll try to integrate the changes into my post once I can take some time to give it a good lookover and try it out a bit (hopefully it will help out with a new bot I'm currently working on in Haskell).

The original .NET compatible regex is
^(:(?<prefix>\S+) )?(?<command>\S+)( (?!:)(?<params>.+?))?( :(?<trail>.+))?$

Python (place in an r"" string so you won't need to escape backslashes), Perl, PHP, and AS3:
^(:(?P<prefix>\S+) )?(?P<command>\S+)( (?!:)(?P<params>.+?))?( :(?P<trail>.+))?$

Java (before 7 didn't support named groups, want to look at groups 2, 3, 5 and 7):
^(:(\\S+) )?(\\S+)( (?!:)(.+?))?( :(.+))?$

Java (7 and up supports named groups, have not tried this yet):
^(:(?<prefix>\\S+) )?(?<command>\\S+)( (?!:)(?<params>.+?))?( :(?<trail>.+))?$

JavaScript (no named grouping, use groups 2, 3, 5, and 7, does not need to be in a string):
/^(:(\S+) )?(\S+)( (?!:)(.+?))?( :(.+))?$/

The basic premise is under the assumption messages are formatted along the lines of :<prefix> <command> <params> :<trailing>, where any values are optional. If you know a better way to do any of them are know ways in languages I left out, let me know. As far as the regex methods and ways to work it out, that is up to you, I am just supplying the pattern and it is up to you so use it correctly.

Since regex can be complicated, hopefully this saves everyone some time, figuring out the needed methods to use it shouldn't be too hard.

After a comment about some stuff in the RFC, I played around with trying to make the regex work with that specification, I came up with a partially working version. Due to the complexity, my lack of knowledge and lack of benefit from this, I will only post the one edit I made and hopefully not bother with this again. While this is good for something quick and dirty, string methods seem to be more practical.

^(:(?P<prefix>\S+) )?(?P<command>\S+)( (?!:)(?P<params>\S{14} (:)?|.+ :?))?((?P<trail>.+))?$

The params section will end up with either a trailing space or a space and colon. That's the best I could do, and the last I'll do of this.

Tag Cloud

.NET (1) A+ (1) addon (6) Android (3) anonymous functions (5) application (8) arduino (1) artificial intelligence (2) bash (3) c (7) certifications (1) cobol (1) comptia (2) computing (2) css (2) customize (15) encryption (1) error (14) exploit (12) ftp (1) gadget (1) games (2) Gtk (1) GUI (3) hardware (3) haskell (13) help (5) HTML (4) irc (1) java (5) javascript (20) Linux (18) Mac (4) malware (1) math (8) network (4) objects (2) OCaml (1) perl (4) php (8) plugin (6) programming (40) python (24) regex (3) security (19) tools (9) troubleshooting (1) Ubuntu (3) Unix (4) virtualization (1) web design (14) Windows (6)