Jimmy Tang

crowbar for deploying systems

I’ve been eyeing crowbar recently, it looks pretty useful and interesting for deploying servers and applications. I haven’t seen much if at all any documentation out there which suggests that people in the digital preservation and archiving fields are implementing systems at scale, I’m under the impression that most systems/sites are building systems up one piece at a time without much automation. It seems to use chef in the backend for all the automation.

ceph v0 dot 53 released

There’s a new release of Ceph, I hope that they release a stable soon so we can do further evaluations of the Ceph storage system. A few of my work colleagues are going to the Ceph workshop next week. I’m wondering if anyone has taken the CRUSH algorithm and used it in other domains.

hydracamp 2012 penn state

What do you do when you need a crash course on RoR, Hydra and frameworks for digital preservation and archiving? You go to Hydracamp! The syllabus was Day 1 - Rails, CRUD, TDD and Git Day 2 - Collaborative development with Stories, Tickets, TDD and Git Day 3 - Hydra, Fedora, XML and RDF (ActiveFedora and OM) Day 4 - SOLR and Blacklight Day 5 - Hydra-head, Hydra Access Controls Most of the training sessions were hands on from day 1 which was refreshing, as it was hands on I getting the most out of the training session.

digital preservation and archiving is a hpc problem

I shall be going to SC2012 next month, I plan on hitting a few of the storage vendors for possible collaborations and flagging to them that we’re on the look out for storage systems. One of the first observation that the reader will note is “where is that link between HPC and Digital Preservation and Archiving”. It’s probably not obvious to most people, one of the big problems in the area of preservation and archiving is the the amount of data involved and the varied types of data.

slurm bank that big script for banking in slurm

A co-worker of mine (Paddy Doyle) had originally hacked at a perl script for reporting balances from SLURM’s accounting system a year or two ago and he had figured out that it might be possible to do some minimalistic ‘configuration’ and scripting to get a system that’s very basic but functional. It was just one of those things that funding agencies wanted to justify how the system was being used, GOLD was clunky and obtrusive and complicated for what we wanted.

ceph v0 dot 52 released

The latest development branch of Ceph is out with some rather nice looking features, what’s probably the most useful are the RPM builds for those that run RHEL6 like systems. Still no real sight of backported kernel modules :P Also some of the guys in work here just deployed a ~200tb Ceph installation which I’ve access to a 10tb RBD for doing backups on.

a poor mans nas device with ceph

Given that I have a number of old 64bit capable desktop machines and a collection of hard drives at home, I could have run Tahoe-LAFS like I do in work for backup purposes. In fact Tahoe works quite well for the technically capable user. Recently I’ve decided that I need a more central location at home to store my photo collection (I love to take photos with my Canon DSLR and Panasonic LX5).

alternative to talend etl

I’ve used Talend ETL a few times, however I came across this application http://datacleaner.org/, I need to take a look at this somepoint to see if its an alternative to Taled or not, or whether it works on a Mac or not!

ceph v0 dot 48 dot 2 argonaut released

There’s a new stable release of Ceph Argonaut, I seem to be having better luck with playing with the development releases of Ceph. Oh how I wish that there was a backport of the kernel ceph and rbd drivers for RHEL6, I have a dodgy repo and some reverted commits that one of the guys in work told me about. It seems to run but it isn’t great, it can be found at https://github.

going from replicating across osds to replicating across hosts in a ceph cluster

Having learnt how to remove and add monitor’s, meta-data and data servers (mon’s, mds’s and osd’s) for my small two node Ceph cluster. I want to say that it wasn’t too hard to do, the ceph website does have documentation for this. As the default CRUSH map replicates across OSD’s I wanted to try replicating data across hosts just to see what would happen. In a real world scenario I would probably treat individual hosts in a rack as a failure unit and if I had more than one rack of storage, I would want to treat each rack as the minimum unit.