April 28, 2013
How League of Legends Has Ruined E-sports (For Me)
December 13, 2012
So someone suggested I take a look at Ruby
I'm a Python guy, but I like Ruby.
There, I said it. I like Ruby. I'm going to say it a few more times to make sure I believe myself. I like Ruby. Why do I like Ruby, you may be asking yourself? Well, yourself, let me explain to you the things I like about Ruby.
Ruby is clear. When I read Ruby, it is obvious what everything is. I like how Why referred to Ruby as codespeak rather than a computer language, because Ruby has obvious parts of speech (and thank you Why for pointing that out to me) with decorations that make it very clear what they are. For example:
class Foo
attr_reader :neck, :back
@@khia = :awesome
@@but = "She ain't no diamond princess."
def initialize(myneck, myback)
@neck = myneck
@back = myback
end
end
I look at that, and I immediately know what things are. In Python, I have to read words, and who likes to read words? I like that class variables are private by default, and that Ruby provides access control mechanisms like Java and C++, because, let's face it: access control by naming convention (a.k.a. relying on the good will of others) isn't access control.
In Ruby, like Python, everything is an object.
1.times { print "One is the loneliest number }
if 1.==(1)
"One! Singular sensation every little step she takes. doo doo doo doo doo doo doo"
end
Obviously, you aren't going to do comparisons like that, but it illustrates my point: even integers are objects with methods like == and times. Just. Like. Python. Though, there is no times() for integers in Python. *stares at picture of Guido on wall*
No, I don't have a picture of Guido on my wall, but I do have a picture of Jesus, my Lord and Savior. No, I don't have that either, I'm just silly, because it's 7:45am, and I've been awake for 3 hours and 45 minutes, and this sentence needed one more clause.
While we're talking about strings, here is another syntax difference that I am starting to like better in Ruby (extending class Foo from earlier):
def lyrics()
print "#{@neck} #{@back} etc."
end
#{ }. You can insert a block of code into a string. How novel! This is not a good idea, for reasons of readability, in my opinion, but it clearly defines what part of the string is code and what part is text. In contrast, Python convention mandates that you use the following weird ass syntax:
def lyrics(self):
print "%s %s" % (self.neck, self.back)
Now, I understand that the first is a format string, but what the hell is that percent sign doing there? Also, it's expecting a tuple after it so if you're just passing one variable you have to do this shit:
def some_real_python_shit(self):
print "%s" % (self.neck,)
Because (self.neck) isn't a tuple. I mean, mathematically speaking it isn't a tuple. It's just a variable surrounded by parentheses. IRREGARDLESS, I hate it; I have always hated it; I have forgotten the comma a hundred times or more. But, I do like format strings. Oh wait! I can do the EXACT SAME THING in Ruby.
def some_real_ruby_shit()
print "%s" % [self.neck]
But, without the trailing comma. So, already Ruby, you are winning over my heart. Thank you for being sensible about the what you put after the MYSTERIOUS PERCENT SIGN. If someone wants to explain to me the origin of the percent sign here, please, god, don't. I want to hate it until I die.
One thing that Ruby doesn't have that I will miss from Python are easy to read list comprehensions. If I want to do:
[ x**2 for x in range(1, 11) ]
I have to do:
(1..10).collect { |x| x**2 }
I am, ultimately, okay with this, though as they are functionally equivalent.
BUT WHAT ABOUT GENERATORS? You haven't talked about generators yet?
Well, Ruby has generators too, and for the most part they are also functionally equivalent. There are some issues with state in Ruby generators (for example you cannot iterate endlessly over a list using a generator), but who the hell really does that? *looks at some of the worst Python code I've ever written in college back when I thought this was what generators were used for*
Basically, all of that sweet, juicy functionality I had with Python I still have with Ruby (and then some, really). The rest is just syntax and sugar.
Am I going to become a Matz convert? I don't know. Talk to me in a year, maybe.
November 11, 2012
Designing my Hadoop cluster
The cluster hosts run Debian. I chose Debian, because of their consistency over the years of being able to provide an extremely high quality GNU/Linux operating system. I like the core philosophies behind the Debian project (see the Debian Social Contract). I like the debian package management system as well (apt+dpkgs). Running an apt server is incredibly easy, and building dpkgs is easily automated in build systems like Make or Ant.
For all of my data processing, I intend to use Cassandra on top of Hadoop. Cassandra was originally developed at Facebook, and has since been widely received by the Hadoop community. Though the choice between HBase and Cassandra was difficult (and finding objective opinions about them is largely impossible), I chose Cassandra because of its philosophical choice that a blanket choice about consistency and persistence doesn’t have to be made. I like that Cassandra allows configurable choices of consistency when making commits to the database.
I want to automate the installation process for new nodes using FAI. FAI allows for extremely flexible configuration of newly installed nodes, but I want to use it primarily for operating system specific installation and configuration. Tasks such as Hadoop node configuration, I want to leave to CFEngine. CFEngine allows for the configuration of many different classes of machines as well as centralized file management through SCM like Git. Configuration management and application distribution will be done via a combination of CFEngine and a centralized apt server.
Now, all I need is an app to build atop my cluster. I’ve been in the planning stage of this project for a few weeks, and I’m really excited to start building all of the images in VirtualBox in the coming weeks. Next up: host hardware configuration decisions, filesystem choices, filesystem layouts, and network architecture.
September 10, 2012
A random perspective on CFEngine
At its heart, CFEngine is a file distribution system. It supports package installation, file manipulation, and simple file transfer from a version-controlled repository. Before we tackle the method of configuration that displeases me the most, let's discuss what CFEngine does exceptionally well in package installation and file manipulation.
Hosts are grouped into classes which define CFEngine behaviors specific to those classes. Each host can belong to multiple classes, and overlaps between classes aren't avoided--so it's best to put some effort into the development of your classes. When defining classes, it is probably best to do so based on the machine groups' basic functionality. For the most part this step simply requires some planning--not only of your CFEngine configuration, but also of the configuration on hosts. For example, it's possible to have a basic "Web Server" configuration for multiple web services groups, and then use Apache's "include" directive intelligently in CFEngine sub-classes that depend on "Web Server." I choose to think of classes hierarchically.
Package management in CFEngine is brilliant and flexible. Versioning of packages can be handled on a case-by-case basis in classes. This makes it easy to have "production" and "beta" server groups where production is held steady at the latest production code release, and beta groups simply have the latest package version installed. Your base "server" class may install Perl, but a sub-class, let's call it the "I hate Larry Wall" class, may wish to uninstall Perl. CFEngine allows you to do just that. It's pretty intuitive, and does everything that you'd hope a configuration management solution would do in regards to package management. It even goes so far as to allow you to work with multiple package repositories and distribution mechanisms.
File manipulation is as flexible as it needs to be and provides a number of methods for editing files in-place on systems. For example, CFEngine provides the ability to append lines if they don't exist, remove lines or words matching regular expressions, and even supports the use of file templates. It's not necessarily intuitive, and sometimes non-deterministic which is, of course, everyone's favorite thing when it comes to configuration file editing, but such situations can be easily avoided with using file editing only minimally.
Any file distributed to CVS comes from a verison control system. As of this moment, CFEngine natively supports CVS and Subversion with a plug-in that allows it to support a Git back-end. Let's assume, for the sake of simplicity and familiarity, that you are a glutton for punishment, hate your life, and do not like productivity--you will likely choose CVS as your back-end storage mechanism. The CVS lifecycle for files controlled by CFEngine is this:
1. Check-in to CVS
2. `cvs update` CFEngine's copy of the repository
3. Host(s) check CFEngine for file changes
4. Host(s) receive updated version of file
This is the simplest and most naive method of distributing files from CVS. It is convenient, flexible, and incredibly easy to use. CFEngine provides you with mechanisms for ensuring proper ownership and permissions of files. It even lets you change the name of the file in transit--just for fun if you want!
With this functionality comes the temptation to distribute code to production systems via this mechanism. "It's so easy," you exclaim to yourself! All you have to do is check-in new code and voila, it's on every production system you maintain. Let's think about that sentence one more time:
All you have to do is check-in new code, and it's on every production system you maintain.
Let's pretend you're having a Benadryl day, and you commit to CVS without running `make test,` for example. Forty-five minutes later, Jeff, your manager, calls to ask why authentication is down. You realize you were just working on the authentication module. You look at the cvs diff from HEAD to the previous version (which takes you 30 minutes because you use CVS), and realize that you checked-in code that ran, but did not do what it was supposed to do--at all. Instead of querying LDAP to see if a user had permission to view a particular page, it simply returned zero, because your new function forgot to assign to the value it was returning--and instead returned the initialized value of 0.
Whoops.
The point of this elaborate, over-worked example is this: distributing production code directly from CVS bypasses any sort of quality assurance methods you may or may not have in place. It does not allow for testing code before putting it into production--unless you're doing development in a branch and copying code to a test machine in some, hopefully automated, manner.
This can all be easily avoided by reminding ourselves of one simple fact. What does CFEngine do really well? Package management.
Version control systems are really great for managing code. You check things in, you test them, and then you ship them. That's the life cycle. Checking them in and shipping them in the same step, however, is dangerous. Rather than using CFEngine as a distribution mechanism for files, fall back to distributing your code via packages--like any sane person would do. It's very simple to add a 'make dpkg' to your build script, and then have it shove the dpkg over to your repository. CFEngine running on hosts will see the new version and pull it. True, you could still introduce bad code, but at least it takes extra steps--and you can check in your code without harming production systems.
I fully realize that testing should happen before check-ins, but anyone who has ever worked with a build bot will know that people break the build. When light-up Jesus turns on above the whiteboard, you know that someone checked in code that broke the build. It happens. Accept it, and then put in some controls that make sure that broken build doesn't make it onto production systems.
June 12, 2012
Surviving the Non-technical Interview
Preparing for a technical interview is easy. If you've done them before, you have an idea of the kinds of questions to expect. It's simply a matter of recalling where you were strong and where you were weak in the last interview and then preparing appropriately. This may mean brushing up on those B+ trees or remembering just how Aho-Corasick works. It may mean remembering how to determine the probability of a four-of-a-kind hand in poker. These are all handy brain teasers that will get your mind in the right place for the Technical Interview.
Non-technical interviews are a completely different beast. Besides the obvious questions that the HR representative will ask you at the start of the day, there is no way to know in advance what will be asked of you that day. Furthermore, in the midst of all this non-technical talk you may be asked some technical questions in disguise. It's an interesting interview mechanic, and I've come to appreciate it: lure the interviewee into a sense of security and then sneak challenging technical question into the course of normal conversation. If you aren't in the technical interview mindset, you may not give a well-reasoned answer. It's a good mechanism to see how you think on your feet--not how you think when buried the trenches of Test Mode.
I consider my performance on the non-technical interview to be fairly lackluster. I think they may still consider hiring me, but I did not impress myself--so I don't think I impressed them, either. I was friendly, likeable, and got along with everyone there (I think), but at the same time, I feel I could have made better use of my time. So, in an effort to formalize my thoughts and to help other people who may not be that great at the non-technical interview, here are a few suggestions (some may be more applicable to your personal experience(s) than others). Keep in mind that a non-technical interview could come out of left field when you arrive, and that some of these suggestions are probably good for interviews in general.
Make a list of questions about the work environment ahead of time. I am applying for software development positions, so I might ask about things like: version control systems, operating systems, development environments, development methodologies, their success with those development methodologies, how are milestones decided upon, or what is the role of the technical lead in project groups? If you have the opportunity to ask questions, use that time wisely.
The people I interviewed with were under NDA, so I couldn't ask about the project specifics, but I realize now that I could have asked around the project. If the interviewers can't answer specifics about the product or what it does, try to find out as much information as possible about the technologies being used. What operating systems are they developing in? What databases are they using? Is their application RESTful? What APIs are they using? What libraries are they using? What's the product architecture like? Is it a MVC design? What languages are they using? Have they encountered any issues in (your particular area of expertise)
If you have any inkling of an idea as to what they're working on, you have the opportunity to show them that you've been considering their product and what it could be--as well as how you might be able to help them or fit into the team. So go prepared to do so. Make a list of questions and take it with you in your little interview notebook. If you interview with more than one person, ask each of them the same questions as they will likely give you different answers. Some interviewers may give you more information than others.
The other pitfall of the non-technical interview is the hidden technical question. For example, while I was interviewing with a couple of the people that day, we got into a fairly familiar mode during the course of our conversations. It was in those points of familiarity and jocularity that they would pose various technical questions. Watch out for these. When they occur, be sure you recognize them, and consider your answers. Don't fall into the relaxed-conversation trap of saying one of the first things that comes into your mind.
I have no realistic idea of my performance during my interview at MITRE. I hope that I get the job, though. I also hope that I can remember all of this the next time I show up for an interview and it turns out to be an informal, personal interview. I also hope that I have the intellectual wherewithal to use this lesson learned in my next technical interview.
May 9, 2012
Graduation
I feel like I should have a plan. Whenever I think of corny, worthless interview questions, the first that comes to mind is, "Where do you see yourself in five years?" I'll tell you where I see myself in five years: out of the South, as far West as possible, and much happier for it all. An attempt at the solidification of details at any finer precision would be futile. This is the beginning of a good plan.
Another place I see myself in five years is in an apartment/condo/loft, by myself, with a pet. Not just any apartment, but a clean apartment with a studio for a living room. One wall will be mirror, one will be whiteboard, one will be bookshelves, and the other will be electronics. I require one chair for reading and one for rolling around the electronics wall. That is my living room. I see that in five years.
Money invested in some kind of growth fund and diversified in stocks would also be a good place to be in five years. It would be good to have some money saved up in the case of an emergency early retirement. This is why my apartment is going to be furnished in such a spartan manner and my apartment only a studio or one-bedroom. I don't intend to grow old anytime soon, but it is, as they say, "better to be safe than sorry."
Los Angeles, San Francisco, Portland, and Seattle. I see myself in one of these cities in five years. The list is prioritized, but truly LA and SF share first place. I think I am more of an SF person, but I know there is quite a bit of Los Angeleno in me. After all, the Museum of Jurassic Technology is in LA. How can I possibly go wrong? Then again, San Francisco has wine country. I think I win regardless of which I choose.
You'll notice I have made no mention of what I will be doing professionally in five years. I haven't the faintest idea what I'll be doing in five years. I've come to the realization that I can pretty much do anything. I don't mean that I can work anywhere or go to school anywhere, but if I decide to go get a Ph.D., I am 100% certain that it will happen--regardless of the subject matter chosen for research. If I stumble back into the arts, I am absolutely certain that I will be wildly successful. If I get another job as an engineer, I will be great at that, too. It doesn't matter what I do--so long as I love what I'm doing.
Right now, I love what I'm doing. I'm moving to New Mexico to work at Los Alamos National Laboratory. I was offered a one-year post-baccalaureate fellowship, and I happily accepted. There is the possibility of permanent employment at the end of the fellowship, and I'd like to continue working at LANL. I am in Network and Infrastructure Engineering, and my projects revolve primarily around automation, report generation, vulnerability assessment, and intrusion prevention. It's incredibly challenging, demanding work. National Laboratories present interesting security problems. I hope I can create a position for myself to assist them in their mission.
July 22, 2011
Vim trick to generate POD
Part of the API I'm writing involves a lot of class member variables. For class variables, I use file-scoped lexicals (from man perltooc):
package Some_Class; my %ClassData = ( CData1 => "", CData2 => "", ); for my $datum (keys %ClassData) { no strict "refs"; *$datum = sub { shift; # XXX: ignore calling class/object $ClassData{$datum} = shift if @_; return $ClassData{$datum}; }; }
So what's the big deal, you ask? Well, it lets me do things like this:
my $foo = Foo::new();
$foo->CData1("foo");
etc.
So, generating documentation for these is pretty trivial. I do inline POD, so it'll look something like this:
=head1 MEMBERS
=over
=item b<CData1([$CData1])>
=item b<CData2([$CData2])>
=cut
Or you can replace the $CDatax with a variable type, if you want.
I'm working on a class right now with about 20 member variables. I wanted to generate documentation for all of the member variables, so that you could do perldoc MyClass to get the list instead of having to look at code.
There is a very easy way to do this! I use vim as my editor, so I did the following:
- Visually select all of the contents of the %ClassData hash.
- Copy the text (y) and paste it (p) into a documentation block
- Highlight all of the pasted text
- Unindent it (<<)
- :s/\(.*\) =>.*/=item B<\1([$\1])>\r
- Enter