ShibalBot – A Python IRC Bot using the Twisted library

I used to use eggdrops back in the day when we would idle in IRC channels for counterstrike and used to write custom TCL scripts to do all that annoying IRC bot stuff (like listing scrim availabilities, cs clan rosters, etc). So while I’m learning Python, I figured I’d try and write my own bot to see if I could do it. I first started off with this example that builds the core of the bot in less than 30 lines of code. While that was fantastic, I then came across this example which uses the Twisted lib and makes it much more robust. I didn’t implement the markov chains, but I wrote up some quick code that can handle saving and displaying IRC quotes. This version of the bot is now just under 180 lines.

You can view the code here.

To use it, you have to change this section at the bottom of the file.

if __name__ == "__main__":
    reactor.connectTCP("YOUR_IRC_SERVER_HOST",
                       IRC_SERVER_HOST_PORT,
                       ShibalBotFactory("YOUR_IRC_CHANNEL",
                                        "YOUR_IRCBOT_NICK",
                                        True, False))
    reactor.run()

Just change the parameters to your own IRC server address, server port, IRC channel, and bot nickname and you’re good to go. Some Todo’s include adding some code to fetch the titles of url’s pasted in the channel, making it modular so you can just drop in scripts instead of having to change the main file, and updating the quotes module to not load the entire quotes file at once (in case the file becomes huge).

Script to sync friends list across multiple reddit accounts

While teaching myself Python, I wanted to try and write something that I would actually use. Since I have multiple reddit accounts, a script that would sync settings across them seemed like it would be useful. So far, the only thing being synced is friends, but I’ll soon have something that also syncs subscribed subreddits as well (hopefully).

The script is being hosted on github and the link is here.

I developed the script using Python 2.7.2 so I don’t even know if it is compatible with Python 3.x. Using the script is fairly easy. Just edit the top of the file above the line that says “DO NOT MODIFY BELOW THIS LINE”.

Example:

login_info = [
["user1", "pass1"],
["user2", "pass2"],
["user3", "pass3"]
]

add_self_to_friends should be either "True" or "False"
(without quotations)

I think I’ll add in some logic to do interactive input if no settings are detected in the file. I’m also going to change the script to do dom parsing as opposed to regex because the regex could run into some weird issues (I know I’ve said dom parsing is a hassle before, but the goal here is to learn as much Python as possible so I’ll try attacking problems from all angles to learn everything).

mentally mutilated

“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”

-Edsger W.Dijkstra

This quote ran through my head today after I realized (after 5 minutes of head scratching) why this line was throwing an exception in javascript:

var sHeading = oInput(0);

Parenthesis for array indexing? Damn you VB!

regex vs parsing

Jeff Atwood wrote a really good blog post about the perils of abusing regular expressions a while back which you can read here. The basic gist was that you shouldn’t rely on using regexes for the solution to every problem (which languages like perl encourage you to do). I agree with this, however, there are some cases where regular expressions are simply the most expedient (but perhaps not the most efficient) solution where you just cannot ignore the simplicity and convenience they offer.

One of the things we do at work is analyze html from news sites. Lots of it. We need to be able to look at a page of html and extract certain sections for information gathering/processing. Among the programming purists, they will tell you that you MUST write a proper html parser with a lexer and tokenizer, because there exists some input that will break your regex. That’s fine and dandy for something with a broad input spectrum (blogs, forums, etc), but we deal with the output from mainly news CMS systems. In fact, many news organizations use similar CMS systems which make this job a lot easier. You just have to see the CMS output pattern/template and write a regex to extract the info you need.

What are the advantages to this? Fast turnaround time. In an industry that’s fast paced and constantly changing, you can roll out changes and keep up with any news site that decides to change their article structure every few months. Instead of retooling a dom-based parser where you’d have to probably change a complicated document definition for every source, you can just simply adjust a client-specific regex and be rolling in a matter of minutes.

What are the disadvantages? Efficiency. Regexes aren’t known for being the fastest tool in the programmer’s toolbox. While your regex is being evaluated, there are hundreds, if not thousands of computations going on as well as regex trees being built to see if there’s a match.

In the end, researching a non-regex based solution to this problem isn’t exactly a waste of time. But you simply cannot ignore the simplicity this solution offers in this type of industry-specific situation. Most of the arguments I’ve seen against regex-based parsing of html revolve around hypothetical input scenarios, but I’ve rarely seen anything that I couldn’t write a regex for. Even something as bad as this (which was lifted from one of the biggest news sources around):

<span class="focusParagraph"><p>
<span class="articleLocatio</span>n">

Experimenting with Aptana

So I’ve decided to start up yet another web project to add to my already long list of unfinished items hidden away in my digital repository closet. This time the language is php and I thought we could try out something new. For the project, I’m basically creating a custom CMS to show article-style pages along with a search feature. My experience with Zend Studio was a favorable one so it will be interesting to see how much of it was due to Zend and how much was due to Eclipse (Aptana is also an Eclipse-based IDE). I’ve also heard netBeans is a good alternative php IDE but I haven’t tried it yet. I’ll be posting again soon with my impressions.

Check Aptana out here

revisit to project euler with c++

As a way to get reacquainted with c++, I’m revisiting all my PE problems.  Here’s the code for #1.

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 1000.

#include <iostream>

int main(int argc, char* argv[])
{
    int iMax = 1000;
    int iSum = 0;
    int iMultiplesOf[2] = { 3, 5 };
    for each (int iNum in iMultiplesOf)
    {
        for (int i = 0; i * iNum < iMax; i++)
        {
            iSum += i * iNum;
        }
    }
    std::cout << iSum << std::endl;
    system("PAUSE");
    return 0;
}

Getting a Prime

Project Euler has quickly become my hobby for when I have small moments of free time.  Here is problem #7:

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6^(th) prime is 13.

What is the 10001^(st) prime number?

and this is my brute force solution:

Public Sub GetPrime()
    Dim iPrimeMax As Integer = 10001
    Dim iPrimeCounter As Integer = 0
    Dim iCurrentNum As Integer = 2
    While (iPrimeCounter < iPrimeMax)
        Dim bPrime As Boolean = True
        Dim iCounter As Integer = 1
        While (bPrime = True And iCounter < iCurrentNum)
            If (iCurrentNum Mod iCounter = 0 And iCounter <> 1) Then
                bPrime = False
            End If
            iCounter += 1
        End While
        If (bPrime = True) Then
            iPrimeCounter += 1
        End If
        iCurrentNum += 1
    End While
    Response.Write((iCurrentNum - 1).ToString())
End Sub

Project Euler #6

The problem:

The sum of the squares of the first ten natural numbers is,

1^(2) + 2^(2) + … + 10^(2) = 385

The square of the sum of the first ten natural numbers is,

(1 + 2 + … + 10)^(2) = 55^(2) = 3025

Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is 3025 ? 385 = 2640.

Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.

My solution:

Public Sub Difference()
    Dim iStart, iEnd, iSumOfSquares, iSquareOfSum As Integer
    iStart = 1
    iEnd = 100
    iSumOfSquares = 0
    iSquareOfSum = 0

    For iCounter As Integer = iStart To iEnd
        iSumOfSquares = iSumOfSquares + iCounter ^ 2
        iSquareOfSum = iSquareOfSum + iCounter
    Next

    Response.Write(iSquareOfSum ^ 2 - iSumOfSquares)
End Sub