Microcenter had an ad for cheap USB flash drives. Something like $10 for a 1 GB. I'm completely in love with the information density they offer, as well as their overall ruggedness, so I went ahead and grabbed a couple. They had two flavors - a Kingston DataTraveler 1 GB and a Microcenter-branded drive, with U3.
It started with wanting to find out which of the two drives was faster. Using XBench, a benchmarking suite for Mac, I went ahead and ran some hard drive benchmarks. The results were pretty comparable, with the Microcenter-branded drive coming out a touch faster on uncached writes. (results below - just you see)
Then in the process of trying to get rid of the U3 "feature", I re-found that OS X's disk utility is a rather powerful piece of software. You can, among other things, create RAID arrays from a variety of different devices.
Sooo...
Plug, plug
Click, click
So more benchmarks were set up with the new flash drive RAID, and the hard drive on my laptop. The results, with the original data, are below:
Really, the whole playing around with drives and RAIDing was just shits-n-giggles. What really scares me is that I automatically started putting numbers into an Excel spreadsheet, pivoting out results, and making charts. I was halfway through specifying chart series before I realized what the hell was going on. Working in the commodities trading industry has done terrible things to my apathy.
Results interpretation is sort of a repeated exercise in "duh", but the most obvious point is that the hard drive kicks the crap out of the flash drives in writing. What's also "duh", but notable, is that while the hard drive took a hit going from sequential to random large block reads, the flash drive (and flash drive RAID) performance was more-or-less unaffected. Surprisingly, the 256k block read performance on the flash RAID array exceeded the hard drive in both sequential and random cases.
Putting it all to practice, copying over XVID movies to the flash raid array allowed for skipping around the movie without playback delay, where the same files on the hard drive required some time to get going after moving. I imagine this could be useful for video editing, or perhaps some database access situations. There's the usual worry about the life of NAND flash memory, but that doesn't really apply to reads, and it's now often stated that wear-leveling has resulted in flash drive lifetimes comparable to magnetic disks.
Overall, what really sticks out in this case is a kind of general engineering observation, in the implementation of systems manifesting dichotomy. Rather than utilizing one approach or either, igniting numerous holy wars en route, the solution is often to utilize a hybrid approach. The most solid examples coming to mind now are CISC/RISC (Micro-ops), and the essential data storage dichotomy of fast+small and slow+big (caching).
The key in any of these is to allow for an offset between the approaches, such that the weakpoints of one side exactly coincide with the strongpoints of the other. The situation illustrated with the comparison between the flash RAID and the hard drive shows a dichotomy that could be exploited accordingly. An interface could be implemented where random reads could be directed to the flash component of a drive, and sequential operations directed to the magnetic drive component. This segregation could be realized by statistical examination, or just with some sort of miniature tournament model, where the interface issues commands to the flash and magnetic components, and waits for the first component to return the result of the operation.
I know it's livejournalish to post IM conversations, but I appreciate harsh truth:
[22:34:59] me: why am i doing this
[23:30:54] friend: why are you doing this?
[23:54:52] me: I don't even know
[23:55:22] me: but I do know that if I'm gonn do uncached random reads, a ghetto flash raid array is wonderful!
[00:06:55] friend: why can't you just have normal hobbies?
Edit: I'm never as clever as I wish I were:
ipod shuffle raid
hybrid hard drives
I'm pissed off, and I'm not sure who or what to blame.
See, right now, I'm trying to build a front-end to the UNIHAN database, which I've found to be profoundly useful for my East Asian language research; specifically the fact that it contains Tang Dynasty-era pronunciations for quite a few CJK (Chinese/Japanese/Korean) characters. The Mandarin, Japanese, and Sino-Korean pronunciations are also very nice to have. Unfortunately, the database itself is a massive UTF-8 flatfile, keyed by a character's hexadecimal UTF-16 codepoint, so it's not the easiest thing in the world to use.
So I'm using Java to make an application to allow me to type in a Han character, and get the UNIHAN data. What pisses me off, though, is the way that (Java/OS X/OSX's Terminal) handles unicode output. It's messed up as hell, and I don't. know. why.
Here's an example. Suppose I have code that looks like this:
System.out.println("你好");
System.out.println("\u4f60\u597d");
The first println works fine, and outputs the characters fine to OS X's Terminal. The second println, using the UTF-16 codepoints for those two characters (U+4F60 and U+597D, respectively), prints out two question marks. Cut-and-pasted, straight from terminal:
你好 ??
I shit you not.
This isn't exactly something anyone I know can help me with, either. The most helpful answer I've gotten is that there're some very rare variant characters that have a codepoint associated with them, but no glyph representation in a font. This I knew, but there's a slight issue with that explanation. Namely, that the characters "你好" are roughly the Chinese equivalent of "hello".
Well, that, and the fact that the characters printed out just fine ON THE LINE RIGHT ABOVE.
So I don't know whether to blame Java, OS X, or Terminal, all of which are purported to be unicode-friendly. Actually, I think there's a strong chance that it's my fault, but we can pretend I didn't admit that. What I may just end up doing here is to declare a "screwit" situation, and just jump to the Swing app I was thinking about making, since that supposedly solves many issues.