I want to split the tables into 500 to record for a particular service. which means a table can only have 100,000 * 1 = 100,000 records
The reason for this splitting is that if we are looking for serviceA we only have 100,000 records to work or access, instead of going to look at extra 49 million records to get small info about a service
Secondly, I can easily backup the tables as they are much smaller than backing up full tables with larger size even when no update was made in them.
Lastly, i can import all tables into one if i need to.
Also my function uses a listed array of tables names if is in array then proceed with the table, also I have show variables table like query to search if the table exists before continuing any query.
So this makes my code one and not multiple for each service.
Given my explanations, what do you think is the best way and reasons for your preference.
I am no expert in these things, and it will be interesting to hear what those with more experience say, but it seems to make things unnecessarily complicated. 50m records on an appropriate server probably isnât a major issue, and the selection time neednât be massively different if the indexing is set up correctly.
MySQL wonât have a problem with 50m records. Iâve see tables that are bigger and MySQL still worked fine. The biggest concern as @droopsnoot points out is getting your indices right so querying stays fast.
Also, 50m is an upper limit, it will probably be lower than that.
If you still feel that itâs too much you can also look into partitioning.
Iâm having a hard time following along because it seems overly complex for what you initially described.
Letâs see if I understand correctly what youâre describing. Youâve got a set of services that users can subscribe to. To me, this could be accomplished with three tables.
Services - this holds basic information about each service
ServiceID - PK which will be referenced later
ServiceName - Title of Service
ServiceDescription - youâll want this as it gives a description of the service which the user subscribes to
Duration - number of days the subscription is good for.
Active - boolean which allows you to show on the services which can be subscribed to at that time.
Users - holds basic info about each service
UserID - PK referenced later
UserName - visible value which the user references (trust me, youâll want this as I guarantee the user will want to change their user name and you donât want to mess with FKs.
name, address, email, etcâŚ
Subscriptions - this holds the services users subscribe to
ServiceID - FK to the Service table
UserID - FK to the User table
StartDate
Active - allows the user to âunsubscribeâ
With this structure, you just need to ensure your keys are indexed, and you should be able to handle well over your anticipated levels.
Active subscriptions for a user would be something like (SQL Server but mySQL syntax would be similar
SELECT s.ServiceName
, sub.StartDate
, DateAdd(day, sub.StartDate, s.Duration) AS DateEnd
FROM User u
JOIN Subscription sub ON sub.userID = u.userID
JOIN Service s ON s.serviceID = sub.serviceID
WHERE u.userName = @UserName
AND GETDATE() BETWEEN sub.StartDate AND DateAdd(day, sub.StartDate, (s.Duration + 1)) -- Add 1 because BETWEEN is not inclusive
AND sub.active = true
AND s.active = true
active users for a particular service would be something like
SELECT u.Name -- would probably have first/middle/last separate but for basic purposes...
, sub.StartDate
, DateAdd(day, sub.StartDate, s.Duration) AS DateEnd
FROM Service S
JOIN Subscription sub ON s.serviceID = sub.serviceID
JOIN User U ON sub.userID = u.userID
WHERE s.ServiceID = @ServiceID
AND GETDATE() BETWEEN sub.StartDate AND DateAdd(day, sub.StartDate, (s.Duration + 1)) -- Add 1 because BETWEEN is not inclusive
AND sub.active = true
AND s.active = true
This is the summary of all i said in the post, we donât need all those other tables, lets say we are dealing with subscription table alone which if all users subscribe will get us 50m records.
And having all the data in one table is just begging for other issues. My suggestion was to group like data in one place so youâre not repeating information multiple times in multiple places.
Using your numbers (500 services and 100,000 users)
So if you have to update data with your plan, you have these to do.
If you change a service name, you have to update 500,000 rows in the database (1 row per user). If you have multiple tables like you were talking about, you first have to scan those tables to find the right one THEN update it.
If you change a user you have to update 500 rows (1 row per service). If you have multiple tables, youâre now updating the same information in each table.
My way has
If you change a service name, you have to update 1 row
If you change a user name you have to update 1 row
It also
allows you to have a user re-subscribe to a service in a way that makes sense and easy to report on
saves storage (not a major cost savings as storage is cheap but could be a performance savings) if you add services/users
you need to check a word in 1000 records (services) and you are sure if that info is not there it canât be there.
Why then do you go searching for such info in 50,000,000 records when you can easily distinguish the table and search.
get all amount from 50,000,000 can not more faster than get all amount from 100,000
if you change a service name, simple go to the table with the old name and run one code of update servicename = ânew nameâ
You donât need to search because you already know the service name.
But there is a valid point when changing userid, if the whole thing is in one table it will be one line of update
but if is in 500 tables then I have to update 500 times targeting each table per time
but then no one changes userid bcs that will mean updating every table where userid is
Number one, youâre assuming that every user is going to subscribe to every service. Thatâs unlikely.
But even if they did, 50M rows is doable if you have the correct indexes in place. It all depends on the server capacity (memory, HD space, load, etc.). Software wise, MySQL supports something like 300 trillion rows, Postgres is almost 10 billion, SQL Server up to 9 quintillion rows. and Oracle claims to be âunlimitedâ but my experience tells me otherwise, though itâs most certainly higher than 50 million.
All you need is that subscription table, and queries similar to what I provided above will gain you whatever you need. The indexes will filter out the data you donât need/want and you should be fine.
Generally speaking, and not specifically in terms of MySQL, indexing is a method of helping to find rows. If you have an index for the service id in your subscriptions table, when you want to find all of the subscribers for one service id, MySQL will use the index rather than just searching through all 50m records one by one until it finds one.
Think of it as being similar to an index in a book. If you look in the index, itâll tell you which pages specific terms (or people, or whatever) are mentioned, so you can find them quickly and easily. When you optimise your table design, you take into account what you might search on to decide which columns should be indexed. The gain is faster searching on those columns, the loss is that the more things you index, the longer it takes to perform updates, though I expect thatâs done in the background anyway.
I have not indexed before, the link you sent is a compressive lecture on indexing, and I appreciate you for that.
But however I canât go into such rigorous reading or study now.
Show me with few lines what indexing is or should be like in my issue, and I will see if is something I should bother about regarding this particular project
You only need to generate indexes on those fields which will be searched on consistently. So if youâre always searching on UserName, then add an index on that field. The basic syntax (in mySQL) is:
ALTER TABLE tableName ADD INDEX indexName (columnName);
Primary keys automatically generate indexes. Your subscription table should have a PK of serviceID, UserID, startDate (or whichever fields you have in the lookup table)
FKs do not, but it may behoove you to generate them for sizeable tables, plus it can enforce data integrity in the tables.