visitor (0 QPoints)
  • FR
  • EN
  • NL
  • DE
  • ES
315 experts, 1193 registered users, 1659 questions already answered
European Experts Exchange, the very best site for high-quality IT solutions

New Improved Search!

 


05/10/2011 1h30 : Steve Jobs is dead, the father of Apple ][ is gone, we are all orphaned.

Languages :: PHP :: What'd be faster?


By: digitaltree U.S.A.  Date: 06/12/2002 00:00:00  English  Points: 70 Status: Answered
Quality : Excellent
If I have a process that involves parsing some information and then adding it to a mysql database (about 20 queries per record) what would be more effecient?

running a big, perhaps slightly slower (because it reads the records directly from a file instead of having it passed to it) php program to process 10,000 records in however long it takes to run.

OR

running 10,000 individual php programs over a similar span of time that do the same thing, but one record for each.

By how much do you think? any way of testing this? I'll assign some more points to this question when I get two other unanswered questions deleted, if somebody somehow gets me hard numbers here it's a worth a lot to me. Any tips to squeeze more speed from this, whatever the best solution may be? Thanks a lot,

digital
By: VGR Date: 07/12/2002 07:12:00 English  Type : Comment
on a monoprocessor and "silly OS" like zindoze, you have no efficient choice but the first you proposed. Spanning 10 K processes will add 10 K times the overhead, and you'll run out of Virtual memory.

You should optimize the first choice so that it runs as fast as possible.

If you have more details, I could help.

I also do a lot of parsing (HTML pages) and at the end introduce stuff in a DB (but not by using 20 queries per record ;-)

One solution for you could perhaps be in this direction (can't say without knowing details of whet you're doing) :
-get all data and put it in a file (in stead of HTTP requests for example)
-read the file at once
-pre-process it (mark/memorize blocks' start and end positions) to accelerate the second, "true" phase. This preprocessing could be done while performing the read I think.
-now use the preprocess data to do your real job as fast as possible

just 0,30 € ideas ;-)
By: digitaltree Date: 07/12/2002 18:42:00 English  Type : Comment
Okay the program is an access log parser. I put a 1x1 gif image on my site with a bunch of javascript set vars following it in the query string. Now I have two options:

1) point the image src to the php script, every time I get a pgae view it will call up the logging script which will interpret and put all the visitor information in a mysql database. Trouble is, 10k page views = 10k php scripts executed.

2) Use an actual gif image, and then parse out the access log for the entries from it. The hits on the gif images will be there, along with visitor information such as useragent, referer, ip, and of course the query string of javascript vars. This would involve one big nasty php program that runs peridiodically (using cron?) to rename the log file and then read in that renamed file and I use a preg_match_all to parse out all the records and all the data I need from each record. I then run through a foreach loop with the matches and interprate all the data before storing it in a mysql database. Trouble is for 10k page views we're looking at 200,000 simple mysql queries, a menagerie of if, else if, else, swith statements and a bunch more preg_match and strstr etc.


Either way sounds rather CPU intensive and of course there's the 30 second run time limit on my host, if my script runs longer than that apache kills it. Not a serious problem as I can move to my own server later, but I'd rather not start with that additional expense.

Any ideas are much appreciated! Thanks.
By: monange Date: 08/12/2002 12:00:00 English  Type : Comment
I would go with the first option pointing the src to a php script, since it seems your going to do the same number of queries or even less, and the cpu ussage will be more "uniform" with this option. With the second option, whenever you execute que program, your cpu usage will go up pretty much.

By: digitaltree Date: 08/12/2002 13:22:00 English  Type : Comment
But isn't it a serious drain to the computer running 10k php programs compared with just 1? Surley there must be an overhead when interpreting 10k 1500 line scripts vs 1 1600 line script?
By: TheFalklands Date: 09/12/2002 06:38:00 English  Type : Comment
There is surely some overhead running 10000 scripts compared to 1 longer script. Maybe you can provide a little more detail on the hardware configuration?

If you have a super-powerful database server, and a weaker web server, I would recommend putting a single PHP file running the long query. If your web server is very powerful, maybe you can get away with running multiple PHP pages.


By: VGR Date: 09/12/2002 06:57:00 English  Type : Answer
anyway, you have really NO choice ;-)

The "safe-mode" (or 30s) limit prevents you from using solution#2, for SURE.

Stick with the "safe and secure" method#1 : log every access AT THE TIME IT OCCURS : it'll slow down ***a bit*** your "program", but saves you A LOT OF (CPU) time later. You'll just have to select stuff in your logging DB (like I do ;-)

I also have a access+error logs parser (to send complains about worms, viruses and hackers), but I run my own host. I must admit that I'm not using it any more because of the CPU time consumed, so if I had to do it again, I would stick to the "logging DB" solution, where only relevant data would be put (i.s.o. all HTTP-Answered by 100 and 200 requests making the ***really*** interesting lines almost invisible to the eye 8-))
By: digitaltree Date: 09/12/2002 08:10:00 English  Type : Comment
Unless one checks the time in the program running, and after 20-25s mark the last record processed in the log file and then quit so that the next instance of the program that runs will start where the other one left off. It is possible to do something along these lines and avoid the 30s limit, but is it worth it? I don't know:(

Do register to be able to answer

EContact
browser fav
page generated in 384.763000 milliseconds

Why Google AdSense ads ?

compteur
 Ranking-Hits PageRank for this page