About andy

in west midlothian, born and raised... on the playground was where i spent most of my days... but yea, i'm a freak.

Steve Jobs has died

apple logo

Steve Jobs has died at age 56. The internet has gone insane with everyone and their mother trying to eulogize Jobs. I even read an article where the author said he was “Moses in a turtleneck.” That’s just crazy, but whatever. While I didn’t agree with his business practices or philosophy, I acknowledge his role as a tech giant and someone who helped shape the industry we work in today. RIP. It will be interesting to see where Apple goes from here without Jobs at the helm.

ShibalBot – A Python IRC Bot using the Twisted library

I used to use eggdrops back in the day when we would idle in IRC channels for counterstrike and used to write custom TCL scripts to do all that annoying IRC bot stuff (like listing scrim availabilities, cs clan rosters, etc). So while I’m learning Python, I figured I’d try and write my own bot to see if I could do it. I first started off with this example that builds the core of the bot in less than 30 lines of code. While that was fantastic, I then came across this example which uses the Twisted lib and makes it much more robust. I didn’t implement the markov chains, but I wrote up some quick code that can handle saving and displaying IRC quotes. This version of the bot is now just under 180 lines.

You can view the code here.

To use it, you have to change this section at the bottom of the file.

if __name__ == "__main__":
    reactor.connectTCP("YOUR_IRC_SERVER_HOST",
                       IRC_SERVER_HOST_PORT,
                       ShibalBotFactory("YOUR_IRC_CHANNEL",
                                        "YOUR_IRCBOT_NICK",
                                        True, False))
    reactor.run()

Just change the parameters to your own IRC server address, server port, IRC channel, and bot nickname and you’re good to go. Some Todo’s include adding some code to fetch the titles of url’s pasted in the channel, making it modular so you can just drop in scripts instead of having to change the main file, and updating the quotes module to not load the entire quotes file at once (in case the file becomes huge).

Script to sync friends list across multiple reddit accounts

While teaching myself Python, I wanted to try and write something that I would actually use. Since I have multiple reddit accounts, a script that would sync settings across them seemed like it would be useful. So far, the only thing being synced is friends, but I’ll soon have something that also syncs subscribed subreddits as well (hopefully).

The script is being hosted on github and the link is here.

I developed the script using Python 2.7.2 so I don’t even know if it is compatible with Python 3.x. Using the script is fairly easy. Just edit the top of the file above the line that says “DO NOT MODIFY BELOW THIS LINE”.

Example:

login_info = [
["user1", "pass1"],
["user2", "pass2"],
["user3", "pass3"]
]

add_self_to_friends should be either "True" or "False"
(without quotations)

I think I’ll add in some logic to do interactive input if no settings are detected in the file. I’m also going to change the script to do dom parsing as opposed to regex because the regex could run into some weird issues (I know I’ve said dom parsing is a hassle before, but the goal here is to learn as much Python as possible so I’ll try attacking problems from all angles to learn everything).

PeerBlock doesn’t play nicely with others

I just spent a good 15 minutes trying to figure out why the hell Steam wouldn’t start up. It was giving me an error consisting of “Could not connect to Steam network…” but I was able to get on the internet and load up Steam’s own website. Turns out it was PeerBlock refusing connections from Valve. I just right-clicked on the Valve IP in PeerBlock’s window and allowed it permanently to fix the issue. Hope this helps someone else with this problem.

foodie mementos

Posting some pics of food that I’ve made lately.

pizza

Cast iron skillet pizza with pepperoni and onions.

tonkatsu

Tonkatsu with rice and shredded red cabbage.

galbitang

Galbi tang (Korean short rib soup)

dukguk

Dukguk (Korean rice cake soup)

strawberry shortcake

Traditional Strawberry Shortcake.

 

April Update

It’s been a while since my last post so I thought I’d just give a quick update. I got a new laptop to do some work on — a flashy new Lenovo Thinkpad T510. It is quite sexy and reminds me of how long it’s been since I’ve gotten anything new hardware wise. I remember buying laptops and getting friggin CDs with recovery software and backup discs with Windows with perhaps a recovery partition. Fast forward to today. No more optical media being shipped with anything. Just the recovery partitions. This is probably just the manufacturer’s way of cutting costs which they’ll argue is “more green” but really, it ends up costing you more to do the same shit. You can still get recovery discs. Only now it costs you $50-100 to order it from them when you used to get it INCLUDED IN THE PRICE OF THE LAPTOP. It could also be that as I get older, I am getting more and more concerned with my money and where it goes (read as: CHEAP). Maybe I’ll do a hardware review on the laptop on here when I get around to it. I’m still working on installing Arch Linux on that badboy (which is quite frustrating). Ubuntu installed in almost 10 minutes flat, no problem. Arch installs, but I can’t for the life of me get X server working on there. Whenever I figure that out, I’ll post it here to help any other unlucky souls in the same predicament.

I’m also going to try and do some cooking/food blog posts on here or break away and start up my own foodie blog. I’ve been experimenting with some different pizza recipes, and have decided that New York style crust kicks Chicago crust’s ass. My last couple of deepdish crust recipes have come away being way too thick. Next time I’m going for a simple New York style thin crust with maybe some fresh mozzarella and marinara with basil. I’ve also learned how to make a kickass bulgogi recipe which I’m definitely going to put up on here with some pics. Tomorrow will be bacon-wrapped barbecue chicken with macaroni and cheese. Mmmmmmm!

mentally mutilated

“It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration.”

-Edsger W.Dijkstra

This quote ran through my head today after I realized (after 5 minutes of head scratching) why this line was throwing an exception in javascript:

var sHeading = oInput(0);

Parenthesis for array indexing? Damn you VB!

regex vs parsing

Jeff Atwood wrote a really good blog post about the perils of abusing regular expressions a while back which you can read here. The basic gist was that you shouldn’t rely on using regexes for the solution to every problem (which languages like perl encourage you to do). I agree with this, however, there are some cases where regular expressions are simply the most expedient (but perhaps not the most efficient) solution where you just cannot ignore the simplicity and convenience they offer.

One of the things we do at work is analyze html from news sites. Lots of it. We need to be able to look at a page of html and extract certain sections for information gathering/processing. Among the programming purists, they will tell you that you MUST write a proper html parser with a lexer and tokenizer, because there exists some input that will break your regex. That’s fine and dandy for something with a broad input spectrum (blogs, forums, etc), but we deal with the output from mainly news CMS systems. In fact, many news organizations use similar CMS systems which make this job a lot easier. You just have to see the CMS output pattern/template and write a regex to extract the info you need.

What are the advantages to this? Fast turnaround time. In an industry that’s fast paced and constantly changing, you can roll out changes and keep up with any news site that decides to change their article structure every few months. Instead of retooling a dom-based parser where you’d have to probably change a complicated document definition for every source, you can just simply adjust a client-specific regex and be rolling in a matter of minutes.

What are the disadvantages? Efficiency. Regexes aren’t known for being the fastest tool in the programmer’s toolbox. While your regex is being evaluated, there are hundreds, if not thousands of computations going on as well as regex trees being built to see if there’s a match.

In the end, researching a non-regex based solution to this problem isn’t exactly a waste of time. But you simply cannot ignore the simplicity this solution offers in this type of industry-specific situation. Most of the arguments I’ve seen against regex-based parsing of html revolve around hypothetical input scenarios, but I’ve rarely seen anything that I couldn’t write a regex for. Even something as bad as this (which was lifted from one of the biggest news sources around):

<span class="focusParagraph"><p>
<span class="articleLocatio</span>n">