Storage Options and Architecture in MyTardis¶
Database layout for storage¶
The storage for each DataFile is configured individually. A “way to store data” is called StorageBox. Each file has one or many related DataFileObjects, which link a DataFile with a StorageBox. A DataFile can have several copies stored in different StorageBoxes via several DataFileObjects.
StorageBoxes¶
StorageBoxes contain all the information needed to store a file except an id unique to the file and storagebox.
Each StorageBox points to a class that implements the Django storage API via a python module path as string.
Optional instatiation parameters for each StorageBox
can be stored in
StorageBoxOptions. These are used as parameters to the storage class set in
the django_storage_class
attribute of a StorageBox
These parameters are string types by default. However, by
setting the optional parameter value_type
to 'pickle'
, any picklable
object can be stored here and hence used for instantiation of the storage
class.
Optional classification and other metadata can be stored in StorageBoxAttributes.
A special case is where someone registers a file and wants to put it into location themselves but needs to be given the place to put it (via the API). Such situation can only be resolved with StorageBoxes that implement the “build_save_location” function. Such StorageBoxes need to have a StorageBoxAttribute with key “staging” and value “True”.
DataFiles¶
DataFiles are the logical representation of the file. They contain information about the name, size, checksum, dates etc.
DataFileObjects¶
DataFileObjects point to the DataFile they belong to and the StorageBox they reside in. They also have an identifier that the StorageBox uses to find the actual file. DataFileObjects also have a date, and a state-flag.
Available backends¶
Django storage API compatible backends are available for example at https://github.com/jschneier/django-storages
We have tested the following backends with MyTardis:
- File on disk or any other system mounted storage
- SFTP
- SWIFT Object Storage
Documented backends¶
Appendix: Conversion of ‘Replicas’¶
Replicas used to be the method of file storage abstraction in MyTardis versions 3.x. The StorageBoxes replace this. For pain-free upgrading, a conversion has been included with the database migrations that works as follows:
All ‘Locations’ that are local are converted to default (folder on disk) StorageBoxes. All ‘Locations’ that are not local are converted to dummy ‘link only’ StorageBoxes with the corresponding name. These can be upgraded manually to a StorageBox with an appropriate backend after the migration has taken place.
Max size is set to size of disk, hence for multiple locations on the same disk this number provides no protection. This also should be set to reasonable values manually after the migration.
Each ‘Replica’ of a file will be converted to a DataFileObject pointing to the relevant StorageBox.
All files are manually reverified, so that unverified entries can be checked for errors.