Log in

If I sharpen this knife enough, I'll have a great looking handle

Apr. 1st, 2009 | 11:27 pm
mood: detail-oriented

lambda { |f,*x| 0 while x=f[f,*x] } [ lambda { |f,*x|
puts ["what was it", "is it a #{x[0]}",
	"what might I ask to tell a #{x[1]} from a #{x[0]}", x[2]][x.size]+"?"
( 5>(x<<gets.strip<<x[-1]["n"]).size ?
	(2<x.size && x.pop && 3!=x.size ?
		3==x.size ? x:[x])[1..-1]:
	[f,lambda{|*y| y[1..-1]}].method(x.pop ? "dup":"reverse")[
		].enum_with_index.map {|g,i| g[f,*x[i]]}<<x[-2]
).unshift(x[-2]).last 3}]

I'm a little obsessed with the above code. This is the animal guessing game in Ruby ("think of an animal and I will try to guess what it is. Is it a pet? Is it a dog? No? What is it then? How can I tell a cat from a dog? Play again? Is it a pet? Does it meow? etc..")

The version above is adapted from one I posted originally to Facebook a year and a half ago. I've thought of a few new tricks with arrays since then, and it's now some of the most dense code I've ever written. I would like to eliminate the two checks to the length of the x array, but I don't think that's possible. I suspect I could come up with something more terse than enum_with_index as well - it's been a while since I've looked at that part.

There is no practical application to this at all. Well, ok - I now understand arrays very well indeed, and I especially like the first / last method pair as it will not fail when you ask for more of an array than there is available.

addendum: shaved off a few more characters! It looks like I can live without method(), though that was an elegant approach... except for resorting to dup.

lambda{|f,*x| 0 while x=f[f,*x]}[lambda{|f,*x|
puts ["what was it", "is it a #{x[0]}",
	"what might I ask to tell a #{x[1]} from a #{x[0]}", x[2]][x.size]+"?"
( 5>(x<<gets.strip<<x[-1]["n"]).size ?
	(2<x.size && x.pop && 3!=x.size ?
		3==x.size ? x:[x])[1..-1]:
	[f,lambda{|*y| y[1..-1]}].values_at(x[4]?0:1,x[4]?1:0
		).enum_with_index.map {|g,i| g[f,*x[i]]}<<x[2]
).unshift(x[-2]).last 3}]

addendum #2: this is the final version, five days later. I really can't believe how much time I've spent on this.

lambda{|f,*x|0 while x=f[f,*x] }[lambda{|f,*x|
puts ["what was it","is it a #{x[0]}",
	"what might I ask to tell a #{x[1]} from a #{x[0]}",x[-1]][x.size]+"?"

case (x<<gets.strip<<x[-1]["n"]).size
when 3;	[x[0]]+(x[-1]?f[f,x[0],f[f]][1..-1]:[])
when 5;	[0,1].zip(([lambda{|*y|y[1..-1]},f]*2)[x[-1]?1:0,3]).
		map{|i| i.pop[f,*x[*i]]}<<x[2]
else;	x[0..-2]
end }]

addendum-de-dum - no more. Really. After this.

lambda { |f,*x| 0 while x=f[f,*x] }[ lambda { |f,*x|
puts ["what was it", "is it a #{x[0]}",
	"what might I ask to tell a #{x[1]} from a #{x[0]}", x[-1]][x.size]+"?"
case (x << gets.strip << [f,lambda { |*y| y[1..-1]},f][x[-1]["n"]?0:1,3]).size
when 3; x[-1][2] ? f[f,x[0],f[f]] : x[0]
when 5; [0,1].zip(x[-1]).map{ |i| i.pop[f,*x[*i]]} << x[2]
else x[0..-2]

Link | Leave a comment {2} | Share

Forward error correction and JPEGs

Mar. 12th, 2009 | 05:26 pm
music: CONNECT 14400

A fun project to work on and to serve as a carrot to enjoy after I've finished writing a history essay and studying for two exams: characterize the noise floor of a communication medium defined as a fixed-size JPEG-compressed bitmap, and determine the suitability of a forward error correction method to approach the Shannon limit. Questions:

  • how can the size of a given JPEG file be related to the amount of recoverable meaningful data it contains?
  • what is the most efficient approach: non-error corrected storage, FEC storage, distribution of a "correction list" of patches with the compressed bitmap, or a combination of the last two?
  • (trivial) is JPEG to bitmap decompression well-defined and invariant, or will certain inputs yield varying outputs given standards-compliant decompressors?

It would be ridiculous to claim that there is any practical application for this so I won't. I thought it might be fun to abuse a flickr account for data storage; the interesting and unsolved problem I have is the distribution of large amounts of data on the internet to many recipients. How many thousands of JPEGs are needed to reliably reproduce a DVD? (it gets more interesting when the uncertain areas are wandered into, since we can test decompression to make sure that the intended result is the one we get. Perhaps the input can be fine-tuned to yield the correct result even when complete, correct reproduction of purely random input is unlikely. Of course this requires well-defined decompression.)

I think I'm going to have to give up if I run into fields, though.

Link | Leave a comment | Share

no volts: a plane, stopped trains, and a bicycle

Jan. 17th, 2009 | 09:14 am
mood: i need a shower

Thursday evening was already an eventful time; N. was packing for her little jaunt and, well, this is always a little stressful no matter how well prepared you are - and she prepares very well indeed. This turned out to be a good thing and she had most of her packing done well in advance, where I usually procrastinate and stuff everything in arm's reach into my luggage in the last terror-filled half-hour before I (absolutely, positively, can't even hope to make it past this time) have to go.

A pleasant enough evening with a passable dinner of my making and Irished-up coffee to follow (classy, I know). We even handled a minor crisis when the microwave and (new) coffee maker drew too much current and blew a fuse. Replaced the fuse and serialized that operation. Then, two hours later, the lights went out.

"Shit! The fuse blew again?" Well, that was a reasonable conclusion. We have the bias of the earlier event, and the fact the computers are still running. On the other hand - all of the lights are out, only the battery-powered computers are running, and they do seem desperately unhappy (especially my poor laptop, which is essentially on heart-lung bypass with the non-battery-powered server). We still haven't realized the magnitude of the situation, so... "Oh, no - did I knock out the power by ironing and making muffins at the same time?" Hmm, possible but - oh, look, the building across the street has no power! And the lights at the subway station are out, except emergency lighting (although, really interestingly, trains continued to run until the regular subway closing time). Ah. This is the real deal. And it might take a while before the lights come back on. Ok. Flashlights, and then candles.

The whole event seemed very apocalyptic. This obviously wasn't a full-scale outage as streetlights were on and other buildings in this vicinity clearly had power, but we could tell that a few thousand people were affected. We didn't grasp the magnitude of the outage until much later; as it turned out, we were in the middle of a quadrant bounded by Spadina Ave, St. Clair Ave, Jane St., and Queen St W / Queensway. In other words, west Toronto. Bear in mind though that these details would all have to come later; the only link to the outside world that we had was our cellphones, which, really, is not so bad, but it's a lot less than we're accustomed to.

Anyway, I had a feeling that it might be a while, and it was. I think the candles started making the air a little stuffy - it didn't help that half of them were pumpkin-scented or whatever - so ventilation wasn't working. We live in a fairly large apartment building so heat wasn't really a problem, and there was enough water in both the hot-water and cold-water reservoirs to last throughout. The hot water was decidedly lukewarm in the morning, though, which made for a cold and short shower. The elevators were out, of course, and they were hopelessly confused when the power came back, so I had the pleasure of carrying N.'s luggage down fourteen flights of stairs. It was down, at least.

As I've jumped ahead, it's already apparent that we were one of the lucky ones to have power restored after only a few hours. We woke up early with the lights still out, with the intention of taking the subway to the airport. That wasn't an option, so we drove. The drive back is worthy of a blog entry itself. I felt a little like I was plowing through zombies like Will Smith in I Am Legend. (seriously though, the police did a pretty commendable job of keeping a huge crowd from getting out of hand on what was a cold day - for Toronto, that is) The subway was restored in time for me to take it to work, but I decided to ride my bike anyway. I've never biked in sub-zero temperatures, so this was a first for me, and probably a little reckless since it was about -15. It worked out okay. Changing was a hassle, but my sweatpants+old jeans approach seemed to work. It was really just about the same as skiing on a cold day. Extremely tiring on the way back. My thighs started tingling at the halfway point and when I hit a massive pothole, condensation formed on the seat while I dismounted to collect my bike locks (yes, two of them, both violently ejected from my bike rack as I traversed the pothole).

It was a pretty exciting day. I got home, disrobed, had a lovely cup of tea, and fell asleep with the lights on. I think I've left out quite a lot but that is the gist of it. Anyone else had a fun day?

Link | Leave a comment | Share

We used to call them ISA slots, and they had jumpers, and....

Jan. 14th, 2009 | 01:49 pm

So it seems like desktops have had their day, which confirms more or less my experiences at the fruit stand. It used to be that a laptop (when did they stop being notebooks, anyway?) was a semi-useful ancillary to the real computer, invariable a PC with a 60 lb. crt, modem, giant tower or imposing desktop. Wired mouse, profoundly terrible ergonomics - not that laptops have improved that situation!

I feel a little sad about that, and I hope it won't make PCs prohibitively expensive for those who tinker. Then again, most of what I've always played with is essentially dumpster salvage. I don't know what I'd do with a new PC. I think I'd look at it in wonder, at first, at how shiny it was and how everything worked - then I'd scratch my head and think how stupid it is that I can't take it with me to school. I probably wouldn't play games on it. Part of living with N. is that there's always someone to play Wii with, and I guess I could get back into FPS but, well, I think they'd be more fun on a laptop anyway. And, not so ridiculously anymore, laptops can play games to my satisfaction. I still think Half-Life is pretty amazing, so that's more or less where I am with gaming.

Laptops seem to be opening a new market in the sub-$500 category that PCs never really addressed. You could build a PC for that amount, but it seemed that they were such unsatisfactory and bland machines that they didn't particularly catch on. The build quality was lousy too, and they were essentially blank slates. Great for Linux, but not so good for much else.

Well, whatever. With enough space and money, I will always prefer a desktop on the... desktop. But I guess I have to affirm the zeitgeist in agreeing that if I had only one computer, it would be a laptop. All that this proves is that I'm a hopeless geek and I will always be outnumbered by my (hopefully loyal) computers.

oh, I ended up beating the odds on N.'s laptop after all. No, not easy, not a bit, and of course her reaction was - "what did you do? you did nothing, it looks this same!" Well, what do you expect after you wake up from brain surgery? Hyper-intelligence? The ability to smell emotions? Maybe... maybe the patient with 5% odds of survival complains about these things. I'm willing to bet that they don't, dammit. More on that another time. (computers and cloning, that is, not neurosurgeons vs. geeks)

Link | Leave a comment | Share

Old computer recovery

Jan. 9th, 2009 | 12:51 pm

N. has a new notebook, and I'm transferring everything over from her old one. By everything, I mean putting in a Knoppix cd on the new one and booting the old one's Debian partition and netcat'ing the disk over. (as a side note, with a USB-USB transfer cable, my experiments showed the best throughput to be had by piping the sender's dd through gzip -1; the USB cable needs a lot of cpu from the old computer, so not much is available for compression except the gain from effectively run-length encoding empty space.)

It has been quite a pain so far and I haven't solved it yet. I'm actually pretty close to giving up if my next attempt fails. However, I learned a few things in doing this, and I don't really want them to clutter up my brain any longer, so I'm hoping that writing them down will expunge the details.

  • the Toshiba Satellite M30 (old computer) recovery DVDs have encrypted Norton Ghost images. The decryption program will not run on any other computer. It acts as a shell to the real ghost.exe program, passing it the password on an authentic computer. That password is 2533.
  • to get that password, (omitting many details) I had start an instance of DOS on the old computer. Aside from how ridiculous this is, I ran into the problem that the USB key boot firmware on the Toshiba does not play nicely with anything that runs after it has done its job. This may have been fixed in a later revision. Anyway, forget about using a USB key, and use the grub4dos boot loader I'd installed long ago (in boot.ini, entirely in the Windows partition - very nice) to boot a FreeDOS image - specifically Balder. Load this floppy image into memory from the Windows partition using memdisk, which is part of syslinux, and is in turn loaded by grub. Don't forget to copy the files needed to harvest the ghost password (details) into the floppy image.
  • oh, yes, use the loadcd batch file to enable CD/DVD support after FreeDOS loads
  • call the predata.bat file on the recovery DVD and don't forget to set TGHOSTPS as in os.bat before running tghost

The end result of this effort is that N. should be able to start using the new computer in exactly the same way she's used to using the old one. That's not so much to ask, really, is it? The new computer is preloaded with new software that we have no interest in using. I will have to use it myself on another system, but the theory is that this will contribute to my gainful employment at some point in the future.

The current technicality is that the old software can be restored onto the new computer, but it seems to lack drivers necessary to boot. The system provides a message with the code 0x7b (who says Unix is unfriendly? Have they checked with the competition lately?) that translates to "boot failure". Oh, ok. I guess I'll see what I can do about that then. Possibilities:

  1. try to fix the recovered system from a partition containing working software; Linux will not be useful for this, so I'd have to use the software that came with the new computer
  2. instead of recovering the old software, copy the old hard drive to the new one directly as I did on my first attempt after pre-installing appropriate driver software on the old computer
  3. buy a retail copy of the old software
  4. admit defeat and use the new software

The numbers in the steps above correspond to the severity of the insanity I'll be suffering as I progress through them, from start to finish.

Now I'd be so pleased if I could just forget all of this...

Link | Leave a comment | Share

All of these hacks are yours, except Europa

Jan. 2nd, 2009 | 12:52 pm
music: (sshhh...)

A couple of things I'm working on right now:

  • LDAP for the home net - single-source user authentication
  • new laptop: Debian GNU/Linux-on-NTFS, i.e. ntfs-root (ntfs-3g should make this possible)
  • using a photo frame as an info panel for the server; I need programmatic USB disconnection for this to work
  • wedding site
  • boot-132 "Mac"

That should keep me busy for a little while.

Link | Leave a comment | Share

A novel application for HTTP referer

Dec. 17th, 2008 | 12:00 pm

The HTTP referer (sic) header defined in RFC 2616, as far as I know, hasn't been used to define a path namespace for resources on a web server. This could be a useful application, not for the purpose of "link protection" (trivially bypassed, by the way) but to make content less fragile by lessening its dependency on server file layout. For instance, a web page like Astronomy Picture of the Day will contain relative links to images like this: <IMG SRC="image/0812/DoubleDumbbell_Lopez_c0.jpg" alt="See Explanation. Clicking on the picture will download the highest resolution version available.">. If the site were to be reorganized, for instance renaming the top-level image directory to something more inclusive of other types like media, every referring document being served would have to be scanned and rewritten. With a large enough base of documents, this automatic process would need some verification as there are bound to be problems.

As an alternative, consider something like <IMG SRC="pic_of_the_day" />. This eliminates a path structure entirely, placing the image in the top-level. This might be obviously implemented by using something like an alias or symbolic link on the web server, but this accomplishes nothing and doesn't solve the problem of handling separate files (I propose that "picture of the day" reference the same pic_of_the_day). How then can the web server determine which file to send?

The referer comes in here. The server would have a table indexed by the referer that would return the appropriate document and Content-Type (note that pic_of_the_day has no extension, so the server will need to tell the client exactly what it is). My argument is that it is easier to maintain this table than to encode permanent references to media that may be moved; this should have other benefits too, which might be handy for client-side scripts that could benefit from having easily generated pathnames for referencing server resources.

It seems strange that I haven't found an implementation of this. I'm pretty sure that more than a few sites fake this by maintaining a mapping between something like a /assets directory and the actual file, but they still include a reference in the static HTML in the form of a pathname and a proper filename with extension. I'd like to do away with this pretense and formalize the idea that the elements referenced by a page are entirely decided by the server.

There is a very nice consequence to this, which I started with and worked from backwards to obtain this solution: a strong division between static and dynamic pathnames will exist, which means that there will be no conflict between an address you'd like to generate on the server as a static reference (like my.server.com/~myname or apod.nasa.gov/pic_of_the_day) and dynamically generated references to server resources. The actual names that pages use to refer to server resources are entirely immaterial as long as they are unique with respect to one another; it would be perfectly valid for fruitcompany.com/store to contain <img /> elements that are sequentially named 0, 1, ..., n. The referer fruitcompany.com/store will allow the server to dereference the link and serve it with the appropriate Content-Type. If I ever decide that I'd really like to have the address fruitcompany.com/store/0 as a permanent link to a resource that I can distribute publicly, I'll set that up on the server, and the fruitcompany.com/store page source will simply start numbering its elements at 1.

Link | Leave a comment | Share

Quick package search in Debian/Ubuntu/etc.

Dec. 13th, 2008 | 01:23 pm

This is a one-liner that I've found useful for searching Tag: fields in Debian's package lists. For instance, I might want to know about all the packages available that have to do with Ruby. I'll assume that any mention of the string 'ruby' in the Tag: field of a package warrants its selection.

egrep -hr '^(Package:|Tag:.+ruby)' /var/lib/apt/lists|grep -B1 ^Tag|grep ^Package|cut -d\ -f2-|tr \\n \||xargs -I{} apt-cache --names-only search ^\({}\)\$|sort|less

The first egrep does a recursive search through the package list that apt maintains. The -h flag suppresses printing the filename. The regex selects every occurrence of lines beginning with "Package:" and lines beginning with "Tag:" and containing the string "ruby". In the next grep, packages without a matching Tag: line are discarded. Another grep to clean that up, then cut out everything but the package name, tr newlines into a literal pipe, and pass the now-single line to xargs. That part is a little ugly, because apt-cache wants a regular expression and it's trying too hard for this purpose when all that is needed is an exact match of the package name. Insert an echo before the apt-cache invocation to see what it's doing. It's not a particularly elegant solution, and it works.

edit -- the original version still had echo before apt-cache, which wouldn't have been useful. Fixed

Tags: , ,

Link | Leave a comment | Share

Partially-compiled source code storage

Dec. 12th, 2008 | 12:13 am

So I was tempted to think that storing source code in a parsed form (what I learned is called an abstract syntax tree, commonly abbreviated AST) might be a good idea. I thought this because:

  1. well-formedness is forced; syntactically invalid code cannot be saved (this is only the most trivial kind of correctness, though)
  2. metrics extracted from source code can be more valid: rather than counting lines, operations, objects, classes, etc. can be measured (if spoofing is a problem, though, this might just move it into a higher level)
  3. compilation will be marginally faster. I don't posit this as a realistic benefit, except perhaps for large projects, or specialized cases with large amounts of generated code (XWT's mips2java would be an example). 4. arguments about source code formatting can be swept aside in one fell swoop. Simply set your AST-aware editor to whatever formatting style you'd like.
  4. the original form could be exactly regenerated by recording comments, deviations from regular form, even storage of invalid code in "as-is" form.

However, a post on the Eclipse LDT (language definition toolkit - now defunct?) mailing list a few years ago makes a convincing argument that this is a waste of time ( http://dev.eclipse.org/newslists/news.eclipse.technology.ldt/msg00027.html )

I have an intuitive notion that AST storage should be able to buy the programmer something in that it brings the source form of a program closer to its object code representation. At the very least, it should make it easier for a debugger to find errors in code, and relay messages to the editor or IDE that can be used to highlight sections of code or supply profiling data.

Unfortunately, existing tools like wc, grep, sed, etc., can't get anything useful from AST-represented source. They could still be used with a command-line tool to extract the original source, but this would still be problematic since the extracted text is not the canonical form of the source.

The appeal of this kind of persistence, I think, is that the strong link between statements, functions, classes and files may be weakened somewhat. One of the reasons for storing a project in multiple files is simply to keep individual units of source code manageable, and to allow patches to be merged in similarly manageable units. But if source is stored as a pool of ASTs, aren't files an optional structure? And, interestingly, couldn't we do a SELECT-like operation to extract code and automatically merge disparate source into a single file? This might be interesting for another project that wants to incorporate functionality without doing a lot of digging. It seems to me (I have nothing to support this statement but intuition) that the AST form should provide more useful information for an automated system to extract code with arbitrary granularity, where a single function and its supporting functions and definitions could be automatically identified.

Of course, all of this is possible with regular source code, since I have defined the regular and AST forms as equivalent. And with an excess of CPU and I/O for most projects, AST buys nothing and loses portability. Perhaps it would be useful for interchange, or as a method for marking-up source code or preparing it for publication (including it in a document for instance). The only way for AST storage to have any advantage is for it to restrict the programmer or force more information to be provided (or, compile code on the run as Visual Studio does, and identify dependencies at edit time).

(which I can't stand, since it distracts me and prevents me from writing pseudo-code in the middle of a function as I try to reason out a problem. I do like highlighting and not having to match parentheses by hand. I'm not against method completion and other inline documentation, but I keep the docs open all the time anyway. I do find myself doing something like object.methods.sort a whole lot in irb.)

Ok. I think I've convinced myself that this isn't a useful way to archive source code, and that we can get everything from the original source anyway as long as it's syntactically valid. It might be a different story for code that has been parsed and partially or fully compiled, but not (possibly, fully) linked. And precompiled headers are an old trick that makes sense, or used to back when computers were slow. I suppose it makes sense for continuous compilation.

Link | Leave a comment | Share

Big Plans

Dec. 8th, 2008 | 10:56 pm

I'm getting a feeling that I need to shake things up; and really, things have been just too comfortable and stable for too long. Let's see... how best to mess that up?


I've been doing a fair amount of this lately, and it's been filling a gap. At the same time, I don't feel that I'm gaining anything by building up awesome retail skillz. And there are fairly exciting things happening in the Rails community. And lo and behold, alternate frameworks like Merb and Camping... and some very interesting things on the fringes, like Haskell and a Rails analogue in, believe it or not, Smalltalk!

Am I a relic, or something?

It blows me away, really, it does, when I read the writing of my so-called contemporaries and discover that they've never dereferenced a pointer, started a line of code with a number reachable by GOTO or GOSUB, or, hell, even allocated memory. When I think that I switched to Linux because the alternative, at the time, could be taken down by one misbehaving program - does any of that count for anything? That I've watched computing scale up by several orders of magnitude, programming all the way? I'd like to think that it makes me appreciate the conveniences we have now all the more. What it makes me feel is that too much of a premium is placed on programmer time. Zed Shaw writes about a coder who assumed each month in the year has 30 days. Firefox has inexplicably contracted-out its non-volatile state to SQLite. (their claim that this makes the program more robust in the face of crashes is not borne out by my experience of corrupt .sqlite files)

Write less code?

Every single non-trivial project I've worked on has gone through a few re-writes. I've tossed more code than I've left; I've condensed functions into single lines. Invariably, I've made the mistake of chasing problems that have nothing to do with the task at hand (my proudest achievment here is probably the time I decided to condense postal codes into a base-10 number in an incredibly elegant, general way - extensible to arbitrary magnitude - saving roughly 200K out of a 10+MB database). I've come up with some very suitable solutions as well, but these have been on the third or fourth attempt, and came only after significant reflection and insight gained through the first couple of tries.

I guess the ratio of time to lines of code is pretty high for me. I've also found that excess time seems to cause the code to get somewhat dense, and honestly probably not easily readable by someone not completely familiar with the problem.

Am I smart enough to do what I'm trying to do?

Maybe, but I feel like I'm stumbling sometimes without the rigour of a solid CS education and hours spent solving problems in discrete math. My colleagues seem to do pretty well with an intuitive grasp of logic and algorithms, leaving the tough parts to library code. This is probably the right approach, until it is time to write your own library.

The other thing that bothers me is the difficulty of proving that a program is acting correctly. I think GEB started my thinking about this, and I'm aware that this is the life work of more than a few people. Intuition has its place, but there's nothing quite like saying "well, it seems to work" to reveal a lack of insight into how a program does its job. And even proofs aren't perfect: GEB (I seem to recall) mentions the proof of the four colour theorem, which is not realistically possible by hand; which seems reasonable enough, except that to accept the mechanically-generated proof is to accept the system producing the proof as being correct.

Whatever I end up doing...

...kick me if you ever see me:

  • using UUIDs for primary keys (it's not about the not-chance of collision - it's the laziness)
  • believing that objects and relational databases make sense together
  • believing that Worse is not Better
  • thinking that a century for now, we won't look back in sheer terror at the code that ran the world
  • saying that a database will solve everything (I used to think that this was a good solution for /etc; in my defence, I was 18. I wonder what Apple's defence is?)

Next entry will be ungeeky. I seem to have written only about my complaints in programming, as opposed to my big plans. You know. Big.

Tags: , , ,

Link | Leave a comment {2} | Share