Encoders

Encoder Interface

class acid.encoders.RecordEncoder(name, unpack, pack, new=None, get=None, set=None, delete=None)

Instances of this class represent a record encoding, and provides accessors for the record’s value type. You must instantiate this to support new encoders.

name:
ASCII string uniquely identifying the encoding. A future version may use this to verify the encoding matches what was used to create the acid.Collection.
unpack:
Function invoked as func(key, data) to deserialize an encoded record. The data argument may be a buffer. If your encoder does not support the buffer() interface (many C extensions do), then first convert it using str().
pack:
Function invoked as func(record) to serialize a record. The function may return str() or any object supporting the buffer() interface.
new
Function that produces a new, empty instance of the encoder’s value type. Used by acid.meta to manufacture empty Model instances. The default is dict.
get
Functions invoked as func(obj, attr, default) to return the value of attribute attr from obj if it is set, otherwise default. Used by acid.meta to implement attribute access. The default is operator.getitem().
set
Function invoked as func(obj, attr, value) to set the attribute attr on obj to value. Used by acid.meta to implement attribute access. The default is operator.setitem().
delete
Function invoked as func(obj, attr) to delete the attribute attr from obj. Used by acid.meta to implement attribute access. The default is operator.delitem().

Compressor Interface

class acid.encoders.Compressor(name, unpack, pack)

Represents a compression method. You must instantiate this class and pass it to acid.Store.add_encoder() to register a new compressor.

name:
ASCII string uniquely identifying the compressor. A future version may use this to verify the encoding matches what was used to create the Collection. For encodings used as compressors, this name is persisted forever in Store‘s metadata after first use.
unpack:
Function invoked as func(data) to decompress a bytestring. The data argument may be a buffer. If your compressor does not support the buffer() interface (many C extensions do), then first convert it using str(). The function may return str() or any object supporting the buffer() interface.
pack:
Function invoked as func(data) to compress a bytestring. The data argument may be a buffer. If your compressor does not support the buffer() interface (many C extensions do), then first convert it using str(). The function may return str() or any object supporting the buffer() interface.

Predefined Record Encoders

acid.encoders.KEY

This predefined RecordEncoder uses acid.keylib.packs() and acid.keylib.unpacks() to serialize tuples. It is used internally to represent keys, counters, and Store metadata.

acid.encoders.JSON

This predefined RecordEncoder uses json.dumps() and json.loads() to serialize compatible objects. It is the default encoder if no specific encoder= argument is given to the Collection constructor.

Predefined Compressors

acid.encoders.PLAIN

This predefined Compressor returns its input unchanged. It is used as the default Collection(..., packer=) argument when no explicit compressor is provided.

acid.encoders.ZLIB

This predefined Compressor uses zlib.compress() and zlib.decompress() to provide value compression. It may be passed as the packer= argument to Collection.put, or specified as the default using the packer= argument to the Collection constructor.

make_pickle_encoder

acid.encoders.make_pickle_encoder(protocol=2)

Return a RecordEncoder that serializes objects using the cPickle module. protocol specifies the protocol version to use.

make_json_encoder

acid.encoders.make_json_encoder(separators=', :', **kwargs)

Return a RecordEncoder that serializes dict/list/string/float/int/bool/None objects using the json module. separators and kwargs are passed to the json.JSONEncoder constructor.

The ujson package will be used for decoding if it is available, otherwise json.loads() is used.

Warning

Strings passed to the encoder must be Unicode, since otherwise json.JSONEncoder will silently convert them, causing their original and deserialized representations to mismatch, which causes index entries to be inconsistent between create/update and delete.

For this reason, the json.JSONEncoder encoding=’undefined’ option is forcibly enabled, causing exceptions to be raised when attempting to serialize a bytestring. You must explicitly .decode() all bytestrings.

make_msgpack_encoder

acid.encoders.make_msgpack_encoder()

Return a RecordEncoder that serializes dict/list/string/float/int/bool/None objects using MessagePack via the msgpack-python package.

make_thrift_encoder

acid.encoders.make_thrift_encoder(klass, factory=None)

Return a RecordEncoder instance that serializes Apache Thrift structs using a compact binary representation.

klass:
Thrift-generated struct class the Encoder is for.
factory:
Thrift protocol factory for the desired protocol, defaults to TCompactProtocolFactory.

Example

Create a myproject.thrift file:

struct Person {
    1: string username,
    2: string city,
    3: i32 age
}

Now define a collection:

# 'myproject' package is generated by 'thrift --genpy myproject.thrift'
from myproject.ttypes import Person
from acid.support import make_thrift_encoder

coll = acid.Collection(store, 'people',
    encoder=make_thrift_encoder(Person))
coll.add_index('username', lambda person: person.username)
coll.add_index('age_city', lambda person: (person.age, person.city))

user = Person(username=u'David', age=42, city='Trantor')
coll.put(user)

assert coll.indices['username'].get(u'David') == user

# Minimal overhead:
packed = coll.encoder.pack(Person(username='dave'))
assert packed == '\x18\x04dave\x00'

Key Functions

These functions are based on SQLite 4’s key encoding, except that:

  • Support for uuid.UUID is added.
  • Floats are removed.
  • Varints are used for integers.
acid.keylib.pack(prefix, tups)

Alias for packs()

acid.keylib.packs(tups, prefix=None)

Encode a list of tuples of primitive values to a bytestring that preserves a meaningful lexicographical sort order.

prefix:
Initial prefix for the bytestring, if any.

A bytestring is returned such that elements of different types at the same position within distinct sequences with otherwise identical prefixes will sort in the following order.

  1. None
  2. Negative integers
  3. Positive integers
  4. False
  5. True
  6. Bytestrings (i.e. str()).
  7. Unicode strings.
  8. uuid.UUID instances.
  9. datetime.datetime instances.
  10. Sequences with another tuple following the last identical element.

If tups is not exactly a list, it is assumed to a be single key, and will be treated as if it were wrapped in a list.

If the type of any list element is not exactly a tuple, it is assumed to be a single primitive value, and will be treated as if it were a 1-tuple key.

>>> packs(1)      # Treated like packs([(1,)])
>>> packs((1,))   # Treated like packs([(1,)])
>>> packs([1])    # Treated like packs([(1,)])
>>> packs([(1,)]) # Treated like packs([(1,)])
acid.keylib.unpack()

Alias for unpacks() with first=True.

acid.keylib.unpacks(s, prefix=None, first=False)

Decode a bytestring produced by keylib.packs(), returning the list of tuples the string represents.

prefix:
If specified, a string prefix of this length will be skipped before decoding begins. If the passed string does not start with the given prefix, None is returned and the string is not decoded.
first:
Stop work after the first tuple has been decoded and return it immediately. Note the return value is the tuple, not a list containing the tuple.
acid.keylib.invert(s)

Invert the bits in the bytestring s.

This is used to achieve a descending order for blobs and strings when they are part of a compound key, however when they are stored as a 1-tuple, it is probably better to simply the corresponding Collection or Index with reverse=True.