Jump to content

Recommended Posts

Posted

So, I am currently working on a database project that would improve(or try to) on the current model of database queries and other stuff. One idea I had was to allow a column have multiple types allowed within itself instead of having a fixed type. Would this be a bad idea?

 

Also, would allowing queries to contain arrays/lists be bad?

 

If someone has any suggestions on new features for a database to have, that would be fine. I am currently testing out my skills in programming and seeing how far I can go with a project like this.

Posted (edited)

Would this be a bad idea?

More memory consumption. Type of field attached to every cell. So something like at least 1-4 bytes more per cell *). Which means more I/O data has to be read/written to/from disk.

 

*) variable length string needs 4 bytes for length with 4 GB limit, or 8 bytes for length on 64 bit computer.

Or you need to figure out how to have null terminated cell..

Might be really problematic.

Take for example db with fixed field size: to access row y you just need to calc offset= y*sizeof( row ) and you're done and read/writing whole row with sizeof(row). Extremely simple.

If every row has different length then you need to store these lengths somewhere..

Edited by Sensei
Posted

More memory consumption. Type of field attached to every cell. So something like at least 1-4 bytes more per cell *). Which means more I/O data has to be read/written to/from disk.

 

*) variable length string needs 4 bytes for length with 4 GB limit, or 8 bytes for length on 64 bit computer.

Or you need to figure out how to have null terminated cell..

Might be really problematic.

Take for example db with fixed field size: to access row y you just need to calc offset= y*sizeof( row ) and you're done and read/writing whole row with sizeof(row). Extremely simple.

If every row has different length then you need to store these lengths somewhere..

That is a problem I will have to try to solve. I also realized that for array queries, you could also store an theoretical infinite amount of elements, which defeats the purpose of having the column in the first place. Might have to limit the size of the array list within a query.

Posted

CSV, which is "database for poor", we also have similar problems.

 

At the beginning we just know file name and path.

 

We have to go through all rows doing f.e. fgets() (which ends reading at EOL),

counting them,

at the same time we can store offsets at which each row is in file,

at the same time we can store offsets at which each column is in row.

And we have index of rows,

and index of columns within rows.

 

Then to read cell at col,row we just have to do something like offset=index[row][col] and read buffer with size=index[row][col+1]-index[row][col]

 

Real database must have such index file already done, to not have to parse entire db each time db is opened.

 

Posted

I used to use plain text files and directories. For instance user is from new york therefore new when the user creates an account it is added to the directory newyork then another directory for their username from within that folder then subsequent directories for their files ie. personal information, images, videos, emails, passwordinformation etc

Posted (edited)

SV, which is "database for poor", we also have similar problems.

 

At the beginning we just know file name and path.

 

We have to go through all rows doing f.e. fgets() (which ends reading at EOL),

counting them,

at the same time we can store offsets at which each row is in file,

at the same time we can store offsets at which each column is in row.

And we have index of rows,

and index of columns within rows.

 

Then to read cell at col,row we just have to do something like offset=index[row][col] and read buffer with size=index[row][col+1]-index[row][col]

 

Real database must have such index file already done, to not have to parse entire db each time db is opened.

 

 

The way I am going about storing all the queries and columns is through the linked list method, where each query references the next query within the column. This makes it easier to remove a query from the column, in my opinion. Also, for searching a query, I would be storing the middle query(this is set and changed when a query is added or removed). This way I can perform a modified version of binary search. I also store the last and first queries(head, tail), which can help in the search as well.

 

I used to use plain text files and directories. For instance user is from new york therefore new when the user creates an account it is added to the directory newyork then another directory for their username from within that folder then subsequent directories for their files ie. personal information, images, videos, emails, passwordinformation etc

Using files like this is a very bad idea in regards to security as well as organization. I would rather store the tables as bytes and encrypted. That way I can decrypt the tables and then add them back into the database.

Edited by Unity+
Posted
Using files like this is a very bad idea in regards to security as well as organization. I would rather store the tables as bytes and encrypted. That way I can decrypt the tables and then add them back into the database.

 

Not really facebook does it

Posted (edited)

I used to use plain text files and directories. For instance user is from new york therefore new when the user creates an account it is added to the directory newyork then another directory for their username from within that folder then subsequent directories for their files ie. personal information, images, videos, emails, passwordinformation etc

 

Everything has its use.

But above technique is too slow for more advanced projects.

In real database you want as less as possible slow disk accesses.

Data frequently requested by db clients should be cached in computer memory.

Not really facebook does it

 

Such web portals have separate directory for each user or so (otherwise they would end up with millions files in one folder - very hard to list by scandir()). But they don't keep all info there. Just f.e. photos, files user uploaded etc.

Edited by Sensei
Posted

I think another feature I could add is if an item is referenced once, it is kept in reference until it no longer is needed. For example, have it so the user can request the information once, it is stored, and then the reference can be removed if a condition is met.

Posted (edited)

Don't forget about your audience I know for instance that I cannot afford 150000$ servers so most of the time I will not need to back up all my data in a large set of servers. For instance if I know how to setup mysql to run on a single machine but I don't know how comfortable I would be in getting mysql to run one website over 1000 servers. On an average day facebook goes through about 600+ terabytes of data.

Edited by fiveworlds

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.