Embed Notice
HTML Code
Corresponding Notice
- Embed this notice@pernia
> i assume by page manager u mean the mmu?
the page manager is the component of the database (it's part of the software, not the OS) responsible for reading and writing pages. It usually has a LRU cache of pages which it has recently fetched from disk so it can sometimes return them quicker. Pages can come in several types that indicate what information is stored in them (data tuples, table definitions, indexes, mappings of tables to which pages contain data for that table) but the big one here is the data page. Pages are addressed by their page number, which is literally just the order they are in (usually). A data page holds data tuples. Data tuples are the rows in a table, and they can be logically addressed by (page_number, row_id).
> wouldn't moving json from disk to memory have to happen anyway? why would it be slower in a db than from disk?
It does have to happen anyway but when you get it from a database instead of ripping it straight from a file, first you have to go and find the data you want, and then call fopen. If you just know which file you want to rip json out of, then you can skip all the work of locating it, and just call fopen.
> and wouldn't reading the data from disk be faster since its a B tree, rather than reading the file sequentially?
indexes are b trees, data tuples are just sort of chucked in there in the order they are created usually unless you are doing something fancy like maintaining a physical sort order within the pages, which would be really expensive for CRUD operations as you would have to shuffle potentially your entire table around for every insert.
> then in scenario b, that would mean reading the file sequentially to load it from disk to memory,
nginx does this and it does it in fancy optimized ways that stream the file, rather than load the entire file into memory in one big buffer and then flush it out.
Scenario b is faster if you engineer the files to be laid out in such a way that you don't have to look for them. Placing them strategically means that you just know where they are based on filename. If you did have to search them with like grep and shit then yes that would be much slower.
You have some misconceptions about where exactly the b-tree comes into play. The b-tree powers indexes. To fetch indexed data you first consult the index by traversing its b-tree (fast), and then you still have to fetch the data from its data page if the index tuple wasn't indexing the field you wanted in the first place (which it wasn't in our scenario). The index IS way faster than doing a sequential scan of every datapage that has data for a given table, and checking each tuple in it for the one or however many your query wants. With an index you know the address of the data you want but you still have to fetch it off the disk (unless it's cached by the page manager but let's pretend it isn't).
The database CAN'T be faster than simply reading off a static file. It is simply more work to be done, work that is a superset of the work done by just ripping the file off the disk and out onto the network.
The limitation is that not every scenario allows you to engineer the database out of the picture. This is not a universally applicable strategy. The database offers flexibility and makes difficult things possible, but the realization here is that all you're really doing is serving a static file, and that this isn't necessarily a difficult thing (if you're clever about it).