ehsmeng2
08-04-2008, 12:03 PM
I do the technical stuff at a site hosted by Rimu that is a market place for stock video footage (www.pond5.com). It's a site that's growing strongly with 55'000+ clips online, full text search database index (postgres tsearch2) over clips and messages, statistics etc.
My partner wanted us to start collecting very fine granular statistics so that we can do amazonish "you might also like..." things better. Unfortunately these things are a bit heavy on the database. (a really large number of small commits). Getting some perspective in a time where even grandmothers save cookie recipies in databases it is easy to forget that a database is really dog slow for certain usage. If you just want to keep some numbers in the air for a while (numbers that can be regenerated or isn't extremely important), an array in most programming languages would a million times faster. But, after PHP has generated the web page, the interpreter dies and the next request is started from scratch.
Take a forum thread. It requires some work from a PHP interpretator to render a message. If there 100 subscribers to this forum, this post might lead to this message having to be rendered 100 times possibly within a short time interval.
Say that the login of a user is represented by a number in a cookie; for each page request this number is lookuped in a login table. For each ajax call and everything.
Which items sold best the last week. I wouldn't want 100 users asking for that at the same time. Even if it's spread out during the day, it would mean that I fill upp buffers in the OS with data that probably is useful only in this query.
There is a simple key-value DB (à la Berkeley DB) that does not use disk, instead all data is stored in memory. All entries has a "time to live" and the interface is essentially get/set/delete. This wonderful little deamon is called memcached and there are a lot of languages, including PHP that has bindings for it.
// Done once at init or lazy:
$memcache_object = @memcache_connect('localhost', 11211);
if (false === ($expensive = memcache_get ($memcache_object, "this is a key")))
{
// Not in cache
// Loading something expensive from database,
//or why not generate expensive html?
$expensive = "some value, struct, etc. This follows the limitations of php's serialize";
memcache_set ($memcache_object, "this is a key", $expensive, 0, 60*60*24);
}
// use $expensive
The pattern above has been applied to some terrible performers at our site. We'll also start saving statistics to known keys (save a key with a startnumber and a endnumber and once/hour (a cron batch) pull down these values to disk in one big disk commit).
A wonderful program. I grade it five nerd glasses: 88888. I guess this is how it is to become religous "oh i've seen the light". Fair enough, the next time someone talks to me about Jesus, I'll tell him about the memcache daemon that i worship.
Best regards,
Marcus
My partner wanted us to start collecting very fine granular statistics so that we can do amazonish "you might also like..." things better. Unfortunately these things are a bit heavy on the database. (a really large number of small commits). Getting some perspective in a time where even grandmothers save cookie recipies in databases it is easy to forget that a database is really dog slow for certain usage. If you just want to keep some numbers in the air for a while (numbers that can be regenerated or isn't extremely important), an array in most programming languages would a million times faster. But, after PHP has generated the web page, the interpreter dies and the next request is started from scratch.
Take a forum thread. It requires some work from a PHP interpretator to render a message. If there 100 subscribers to this forum, this post might lead to this message having to be rendered 100 times possibly within a short time interval.
Say that the login of a user is represented by a number in a cookie; for each page request this number is lookuped in a login table. For each ajax call and everything.
Which items sold best the last week. I wouldn't want 100 users asking for that at the same time. Even if it's spread out during the day, it would mean that I fill upp buffers in the OS with data that probably is useful only in this query.
There is a simple key-value DB (à la Berkeley DB) that does not use disk, instead all data is stored in memory. All entries has a "time to live" and the interface is essentially get/set/delete. This wonderful little deamon is called memcached and there are a lot of languages, including PHP that has bindings for it.
// Done once at init or lazy:
$memcache_object = @memcache_connect('localhost', 11211);
if (false === ($expensive = memcache_get ($memcache_object, "this is a key")))
{
// Not in cache
// Loading something expensive from database,
//or why not generate expensive html?
$expensive = "some value, struct, etc. This follows the limitations of php's serialize";
memcache_set ($memcache_object, "this is a key", $expensive, 0, 60*60*24);
}
// use $expensive
The pattern above has been applied to some terrible performers at our site. We'll also start saving statistics to known keys (save a key with a startnumber and a endnumber and once/hour (a cron batch) pull down these values to disk in one big disk commit).
A wonderful program. I grade it five nerd glasses: 88888. I guess this is how it is to become religous "oh i've seen the light". Fair enough, the next time someone talks to me about Jesus, I'll tell him about the memcache daemon that i worship.
Best regards,
Marcus