Databases :: MySql :: Best way to store large volumes of data. |
|||
| By: veloguy |
Date: 13/10/2004 00:00:00 |
Points: 100 | Status: Answered Quality : Excellent |
|
I need to store test results from multiple schools. There are 70 multiple choice questions per test and probably 300 students taking the test per school. With nationwide acceptance, this could amount to a very large amount of data. I would like suggestions/opinions on the best way to store and manage that amount of data in MySQL. I will need to be able to pull the answers from each test group, each school district, and/or the whole country easily. Speed of running the reporting queries is not the most important factor. I'd rather have the data easy to manage. |
|||
| By: kenfcamp | Date: 13/10/2004 21:19:00 | Type : Comment |
|
| Hello: Interesting project you've got there .. heh If it were me, I'd utilize 4 tables Datebase - testing 1 - district/dist_code State (and or city) 2 - school/schol_code dist_code 3 - Student/stud_id schl_code 4 - test stud_id *test1 - 70 you'd be able to get the answer per student per discrict per school or all depending on how you requested the data to be retrieved. Hope this helps KC |
|||
| By: VGR | Date: 13/10/2004 21:55:00 | Type : Answer |
|
| I'm not sure to understand your notation, so forgive me if I duplicate what you thought you wrote 8-) I would do this : 1) table of school district codes 2) table of students per group [not necessary if you don't want to go down the "group" level] 3) table of groups per school district you don't need a table for the country as there is only one 4) table of tests performed (eventually with questions/answers?) 5) your results data would hold the student ID, the test ID, his/her group ID, his/her results, eventually (to speed up searches) the school district ID 6) then to search : (quoting you) [pull the answers from ] a)each test group : SELECT * FROM results WHERE groupID=... b)each school district : SELECT * FROM results WHERE schooldistrictID=... or in two queries if schooldistrictID is not in the results table : SELECT groupID FROM groupsperschool WHERE schooldistrictcodeID=... (loop on x in above) SELECT * FROM results WHERE groupID=x c)the whole country easily : SELECT * FROM results; (or a more complicated query, cascading and grouping data from tables : schooldistrictcodes -> y (as above on y) SELECT groupID FROM groupsperschool WHERE schooldistrictcodeID=y (loop on x in above) SELECT * FROM results WHERE groupID=x regards |
|||
| By: veloguy | Date: 13/10/2004 22:34:00 | Type : Comment |
|
| The question is not so much about the way to organize the schools, groups and references to them. It's about the best way to store the large volumes of answer data. 70 questions X 40 students X 40 classes X 100's of schools X 100's of districts can end up being a lot of data. Too much for any one table. I was thinking that I would have 2 databases. One for the system functions, and another to store the answers. I could create a table for each test instance or class. That table would be named the classID (which would be mapped to the date taken/school/etc with a bridge table), question ID, and the answer. I'll need to be able to run queries like this: select * from classID where age='14' and gender='f' and q14='agree' Then I can see how many 14 year old females agree with question 14. If I need to know that stat for the School I could get it from querying all tables with that SchoolID in the bridge table. Sould like a good plan - I gave it a lot of thought last night :) |
|||
| By: VGR | Date: 14/10/2004 21:43:00 | Type : Comment |
|
| "I would like suggestions/opinions on the best way to store and manage that amount of data in MySQL" "The question is not so much about the way to organize the schools, groups and references to them" OK. |
|||
| By: kenfcamp | Date: 14/10/2004 22:01:00 | Type : Assist |
|
| *The question is not so much about the way to organize On the contrary, with this amout of data, it's all about the organization, expecially if you desire to have flexibility w/ displaying results. *It's about the best way to store the large volumes of answer data. 70 questions X 40 students X 40 classes X 100's of schools X 100's of districts can end up being a lot of data. Too much for any one table. This is correct, with the example structures above you have all the school/district/student information in 2-3 tables, the test results are in the 4th w/ 70 columns (1 for each question) and a column for the student identifier. In this way you'll be able to search by school, district, student type (age) or by question/answer. *I was thinking that I would have 2 databases. One for the system functions, and another to store the answers If you want the table contents even smaller, you can create a table for each school district, or school, this will lower the amount of data (speed things up) but will require more work from you to make more complicated queries. As far as speed, I've got tables with well over 1,000,000 pieces of data, you shouldn't have any problem w/ a few thousand. KC |
|||
| By: Buraque | Date: 14/10/2004 22:30:00 | Type : Assist |
|
| I would be as detailed as I can, because of the possibilty of future needs... My tables would be like this: 1- CREATE TABLE Test(Q_ID TINY INT, Q_Content TEXT, A TEXT, B TEXT, C TEXT, D TEXT, E TEXT, Right_Answer ENUM("A","B","C","D","E"); 2- CREATE TABLE Student(Student_ID INT, St_Name CHAR(100), School_ID INT, Test_Score DEC(3,2)); 3- CREATE TABLE School(School_ID INT, School_Name CHAR(50), District_ID INT, Test_Date DATE); 4- CREATE TABLE Districts(District_ID INT, District_Name CHAR(50), State CHAR(30)); 5- CREATE TABLE Student_Answers(Student_ID INT, Q_ID INT, Selected_Answer SET("A","B","C","D","E"), Result SET("Right", "Wrong")); You can pick even the most detailed info from this 5 tables... |
|||
| By: Buraque | Date: 14/10/2004 22:32:00 | Type : Comment |
|
| OOPS! Sorry in table 5, SET should be ENUM.... |
|||
|
Do register to be able to answer |
|||
©2010 These pages are served without commercial sponsorship. (No popup ads, etc...). Bandwidth abuse increases hosting cost forcing sponsorship or shutdown. This server aggressively defends against automated copying for any reason including offline viewing, duplication, etc... Please respect this requirement and DO NOT RIP THIS SITE.
Please DO link to this page!








