- Database - (preferred orientDB(!), graph Database) - but we might discuss and see ordinary SQL as enough
=> amount of records (1 - 25 Million) - each about 10 columns/fields (might be extended later on)
- GUI / Frontend needed to access Database data
- i would assume front end would be desktop app based
- but maybe a browser based frontend is also good (especially considering the cloud database aspect below.)
- no editing of data as part of user use case needed.
- use cases:
access data (browse data)
statistics on data
visualization data (graph/network type of visualization - optional)
analysis of data (on whole data set and parts of data sets)
- mostly straight forward data extraction tasks. some are more tricky (
visualization of result of analysis (graph/network type of visualization - optional)
in most cases it is only table and (dir-)tree representation
- the visualization / browsing / etc. should rely on lazy-load from database; as full dataset will not fit into memory / display-model
- Some analysis part is the tricky part. It basically should identify equal and (only) similar(!) data records and data record sub-sets (!) across the whole data set.
- the analysis part would benefit from parallelization. I could imagine putting the whole database in the cloud (e.g. AWS, google, etc.) and only keep the frontend local. - best would be if the backend data base can be configurable either local or in the cloud
- focus is on large amount of data, quick on the fly analysis of data, quick display - beauty is secondary.