Random Images

Print

EpointSystemTransactionSafety

Transactional issues. Setting up disks, filesystems, PostgreSQL and backup

We do not want data-loss even in case of a server-HW crash. We must take transaction log and redo relevant part (using checkpoints) to restore to uptodate state when necessary.

  • transaction log to different HW
    • preferrably over the network to minimize chances of HW loss due to broken power supply
  • the only (practical) way to properly protect any single-HW crash (without rendering the issuer prohibitively high latency) is to access the issuer through a gateway, which logs documents (often encrypted; that it aint understant; it is OK)  in both direction.
    • Otherwise the client could receive a certificate from the issuer that is lost with an issuer crash - would be grave.

Transaction log example (just add/replace: no deletes)

<marker>1. inodenr <optional:path/file> length timestamp hash <data>

<marker>2. ...

Such a log can be played back at any time, restoring a certain set of missing data. Operations:

  • find latest complete record
  • seek to record srl=N

 


ext3 journalling discussions  (this is about spinning-rust, not SSD)

What IS worth doing is to write the transaction logs to different hardware than the database.

It is extremely important to do that, at least with postgreSQL, but I would
imagine for any database. I have 6 hard drives on my machine, with 4 of them
having one partition each, just for the database. For any particular table,
I put the index on a different hard drive from the associated data to
minimize seek contention while loading the database, something I do often.

I put the Write-Ahead-Log as they call it on a separate hard drive of its own. Even then, a bulk database load that does mainly INSERTs (don't ask) is seek-limited on the WAL drive. Not IOWAIT: seek!


 

mysql master-slave and master-master replications (auto-increment options!!)

mysql backup-recovery including LVM snapshots and HOT backup


Linux (SGI) File Alteration Monitor (FAM) and Imon


DRBD provides synchronous block-device replication over the network ("RAID to a different machine"):

DRBD (used to be NBD = network block device). Note: you must NOT mount the slave (not even readonly!)

DATA Replication with  drbd.org

  • tokyocabinet on LVM over DRBD ("Protocol C" synchronous replication)
    • or tokyocabinet on ext4 filesystem on LVM over DRBD
    • postgresql or simple files could also be used as key-value store
  • use tokyocabinet transaction (see "transaction" here ) to commit a bunch of transactions (batch) at the same time
  • symmetric-encrypt result certificates with the same key (key changed for every batch):
    • advertise the decrypt key (after data is safely sync-ed to DRBD storage) so clients can only decrypt certificate if all goes well
    • or use different encryption key for each certificate: but keys derived from same salt (salt advertised after sync() )
    • alternatively the server could hold back the certificates, but that results in a more complex "stateful" server (more work to do right and more testing )

DRBD-benchmark

  • RUNNING drbd.org  (8.3.11)  synchronous block-device replication over 100 Mbit Ethernet (between a powerful host and a weak notebook) ny creating 100k files each time
  • ~4800 Files / sec without sync()
  • and 1800-2000 files / sec when syncing rarely, eg. after every 1000 file.
  • With SSD on both nodes (and direct 100 Mbit Ethernet wire between nodes), performance upto about 46 sync /sec from  the spinning rust limit of 14 sync /sec.  The peak is still same, appr 1800 files / sec when syncing rarely (5-10 times per second)

Conclusion: I'm pretty certain that sync() after a bunch of transactions is the (final) way to go. It could be a lot of work to implement (properly) a "stateful" server that only releases the certificates after the batch-sync. But if the responses are sent symmetric-encrypted (even same key for the batch, or different keys but generated from same advertised salt), and the decryption key is only advertised after the batch-sync returned (all data safe on primary and secondary node), than it can be done in a "stateless" server relatively easily

Minor DRBD notes:

  • The initial sync was only 1500 Kbytes /sec, but this can be adjusted (see drbd.org docs) to 8..9 MByte / sec or more if network allows. (often not done through drbd anyway but pre-seed with rsync)
  • I was unable to mount drbd0 at times, eg. after reboot. It was annoying first, but for sure, there are just small tricks one must be aware of. Eg to promote to primary (might be necessary again and again after service down, or conn-lost, or maybe after every reboot ?) and  also "drbdadm adjust ..." was necessary at some point to get the synchronization started (promoting one side to primary was not enough). Of course, cat /proc/drbd and the documenation is your friend.

rsync -auP a/ b/
rsync -auP b/ a/

while propagating additions well has the unfortunate side effect of resurrecting files deleted in either place.

Unison is slow (instead of timestamps it always hashes all files)


Git-sync offline two-way synchronization tool, where any of the backup copies can be modified independently.

 


 



Created by: cell. Last Modification: 2012-07-25 (Wed) 15:15:06 CEST by admin.