1

This is a php script to view apache logs.

I want to save apache logs into a mysql database.

Then add some rules for tagging urls using mysql REGEXP search, like: SET tag='some tag' WHERE url REGEXP 'some pattern';

a) Should I use one table for storing all the urls every time they are accessed even if they repeat and then do the REGEXP search and apply the tag to all of them?

b) Or it would be better to save one table with unique urls, and a second table with the id of the url and the time accessed? Then the tagging will be applied this table that has less rows if the url repeats itself.

If option 'b' is better, what kind of index should I use for unique urls? varchar(4000) primary key? I was thinking about creating a md5 hash of the string of the url and use that as primary key because it will be shorter.

I ask this question because I want to know what would be best performance when:

  • Tagging many urls with regexp search
  • Importing thousands of urls into one table and make sure they are unique

Thanks!

Aaron Bertrand
182k28 gold badges406 silver badges625 bronze badges
asked May 21, 2012 at 22:08

5 Answers 5

3

Because this is really logging data that you are capturing, I would first store it in its raw form, then ETL/normalize it as needed.

For searching, you can index the front part of the URLs first few characters. Let's say the options are http:// and https:// - then limit the index size to three characters passed the protocol part.

CREATE TABLE log ( 
datetime_created DATETIME, 
url VARCHAR(1024), 
domainname VARCHAR(255),
someotherdata VARCHAR(255), 
...
INDEX `idx_domain` (domainname(3))) 
ENGINE=InnoDB ROW_FORMAT=COMPRESSED; 
  • InnoDB will allow you to search the table w/o locking it.
  • Compression will help with disk space.
  • INDEX idx_domain (domainname(3)) will index the first 3 characters of the domain name column and speed up searches. However, the strategy for REGEXP will be to try to match on the front of the domain name.

Scale will eventually be an issue if the site becomes popular, so buyer beware.

answered Jun 22, 2012 at 23:01
1

Why not use apache mod_log_sql ? It help me to send all the log information into Mysql DB directly. But let me remind you that using this on a server with heavy load will cause some issues, 1 biggest problem i see is the Mysql connect timing out.

answered Feb 4, 2013 at 15:18
0

Here is a script I created that will allow Apache log(s) to MySQL database table: http://www.shayanderson.com/perl/multiple-apache-logs-to-mysql-database-table-with-perl-script.htm It is very easy to install, setup and schedule.

answered Jan 11, 2013 at 16:17
-1

I don't think you're going to get very far using mysql (or any db) as the basis of your regular expressions. I could be wrong - I've never used REGEXP in mysql. I tend to shy away from using mysql for regular expressions.

I'm not sure what you mean by tagging but....

You generally will always have to make a decision on the balance between storage use and efficiency when it comes to your db. In other words do you could just insert new rows without regard to duplicate data (unique urls) or you could create relations between unique urls and other pertinent data (timestamp, etc) and perform joins instead.

If you relate your data your indexes will be much smaller.

I personally would use relations using innodb.

How you select your data depends on how you insert it in the first place.

You might also look into using lucene (for full text indexing and search) and memcached for speeding up some of your supporting db operations (like determining when you need to insert a new unique url in your case).

answered May 22, 2012 at 15:37
1
  • If you've never used a feature, please try to NOT discourage others from learning or using it. Commented Jun 22, 2012 at 23:01
-3

grep (or egrep) the logs. Don't use MySQL.

answered May 23, 2012 at 20:02
0

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.