BlogHarbor Home Page
FAQFAQ  SearchSearch  MemberlistMemberlist  UsergroupsUsergroups  UsergroupsRSS   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
Logfiles - the ongoing question
 
Post new topic   Reply to topic    BlogHarbor Community Forum Index -> Beginner's Lounge
View previous topic :: View next topic  
Author Message
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Sat Jul 01, 2006 10:17 am    Post subject: Logfiles - the ongoing question Reply with quote

John,

Would you mind giving us an update on the status of the logfile scraping fix? I can assure you that I'm still not getting reconcilable results.

Recent results
30th - page views exceeded logfile records by 53 - error
29th - page views fewer than logfile records by 23 - okay
28th - page views exceeded logfile records by 364 - error
27th - page views exceeded logfile records by 716 - error
26th - page views fewer than logfile records by 125 - okay
25th - page views exceeded logfile records by 78 - error
24th - page views exceeded logfile records by 139 - error
23rd - page views exceeded logfile records by 200 - error

As you can see, the results are mixed, but unfortunately not greatly improved. I know you were saying that some people are more freaked by the possibility that spam trackbacks are being included in stats.

ICBW, but ISTM that if you're going after eliminating spam from counts before getting count totals correct, you're doing things in the wrong sequence. In accounting, you collect all of the whatever-it-is first, then find the bad whatsits. I don't claim that accounting has a great deal to teach IP, but I suggest that auditing principles and procedures have merit regardless of context; they give you a pathway through the maze. Think of it as trying to debug a program without comments. Rolling Eyes You wind up going back and inserting comments/questions to help you cut your way through the ¢®@ρ - sorta like Major Winchester's "one thing at a time"
Back to top
View user's profile Send private message
john
Site Admin


Joined: 16 Mar 2004
Posts: 3434

PostPosted: Sun Jul 02, 2006 10:59 am    Post subject: Re: Logfiles - the ongoing question Reply with quote

Would you mind giving us an update on the status of the logfile scraping fix? I can assure you that I'm still not getting reconcilable results.

You are right, this problem still does not appear to be resolved. Due to the Canada Day/Independence Day weekend, I won't be able to get an answer to this until sometime next week, but I will be sure to find out what is going on with the log files.

ICBW, but ISTM that if you're going after eliminating spam from counts before getting count totals correct, you're doing things in the wrong sequence.

Thanks for the advice, but that is not the case. These are your raw logs, they have not been modified in any way. The problem seems to be in collating all the logs from all the servers that could possibly have served your log in a given day, and then since some of the servers actually log in a different format (the caching servers log slightly differently than the non-caching servers that you are actually accessing when you are an authenticated user) the 2 formats must be "normalized" and put back as one single format.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
john
Site Admin


Joined: 16 Mar 2004
Posts: 3434

PostPosted: Wed Jul 05, 2006 6:01 pm    Post subject: Re: Logfiles - the ongoing question Reply with quote

Changes were made yesterday to improve the acccuracy of the log files. I just checked your blog's stats for 2006-07-04 and I think you'll find that Site Stats and log files show much more correlation now. Any feedback you have would be appreciated.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Thu Jul 06, 2006 8:43 am    Post subject: Reply with quote

I'd noticed that the logfiles for the 1st & 2nd were short by less than had been usual. I now have three days in a row where the number of records in the logfiles is in excess of the number of pageviews recorded by the other system.

I'm reluctant to declare victory at this point - understandably, I think? Neutral However, it does look as though it's time for me to start downloading and testing different analysis programs. That will likely provide a great deal more information. I will share anything relevant.

Until it began to look as though the information from the logfiles was worth the effort to look at it, I couldn't see bothering. I've got 2½ months of logfiles I saved (minus about 2 days when the size of the logfiles was plainly ridiculous, and not revised).

If the next couple days seem to bear out that things are working, is there a chance you'd rerun the 1st and 2nd? If all is now fixed, that would provide us users with a full month of accurate data, come August 1st.
Back to top
View user's profile Send private message
john
Site Admin


Joined: 16 Mar 2004
Posts: 3434

PostPosted: Thu Jul 06, 2006 8:49 am    Post subject: Reply with quote

Quote:
If the next couple days seem to bear out that things are working, is there a chance you'd rerun the 1st and 2nd? If all is now fixed, that would provide us users with a full month of accurate data, come August 1st.


Unfortunately, we won't be able to rerun those stats.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Sun Jul 09, 2006 10:20 pm    Post subject: Reply with quote

John, it looks like your fix has blown up. I'm a bit dubious about the 7th, but superficially it looks okay.

My logfile for the 8th, OTOH, contains a grand total of 26.6k. Given that there are 739 page views for that day, it's a disaster. Of the other logfiles currently showing, the smallest is 400k; the largest is 709k.

I'm sure you can understand why I feel it's a waste of time for me to even download it. If you want a record count, I'll do it. Otherwise, I'd rather not bother until you rerun it.
Back to top
View user's profile Send private message
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Mon Jul 10, 2006 3:54 pm    Post subject: Reply with quote

Well, the logfile for the 8th has gone from a ridiculous 26.6k to a merely inadequate 162k. Although I knew beforehand that it would be wrong, I downloaded and counted it. There are 262 records in it, contrasted with 739 page views. In other words, it contains at best roughly of the correct number of records.

I don't wish to be either repetitive or boring, but otherwise I fear another round of talking past each other. For reference, the logfile for the 7th is 427k (next smallest currently being displayed), and has nearly a hundred fewer page views. As mentioned previously, I have some doubts that the 7th's logfile may be incomplete, based more on its composition than on the number of records.

The logfile for the 9th is 782k, and it has almost the same number of page views as the 8th. In other words, I believe that the size of the correct logfile should be closer to it than to the size of the one for the 7th, but it really cannot be smaller or have fewer records and be complete.

Should we assume that the revised scraping procedure is inadequate, or is this part of operations being affected by the latest wave of spamming?
Back to top
View user's profile Send private message
john
Site Admin


Joined: 16 Mar 2004
Posts: 3434

PostPosted: Mon Jul 10, 2006 6:00 pm    Post subject: Reply with quote

I don't wish to be either repetitive or boring, but otherwise I fear another round of talking past each other.

I don't believe we are talking past each other. The stats process prematurely aborted the other night while building the logs and we had to rerun the process this morning. Our apologies for the delay.

I do believe you may have just downloaded the logfile for the 8th while it was in the process of being regenerated; I just downloaded the file and it was 708 Kb in length (720,901 bytes) and contained 1071 lines.

When I removed the lines containing non-HTML file extensions (.gif|.png|.jpg|.css|.jpg|.xml|.js|.txt) the resulting number and the number shown in your HTML requests in Site Stats differed by just .42%...
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Tue Jul 11, 2006 10:33 pm    Post subject: Reply with quote

I see, and have now downloaded and counted results. I'm delighted to say that everything now looks great for the 8th, and it was the only one that was visibly off since you said the "fix was in", i.e., 3rd July.

Thanks for getting that day rerun.
Back to top
View user's profile Send private message
john
Site Admin


Joined: 16 Mar 2004
Posts: 3434

PostPosted: Wed Jul 12, 2006 8:28 am    Post subject: Reply with quote

The software was updated on 7/4, so the first stats to be run under that software version were for 7/4, not 7/3.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
gristgal



Joined: 05 Jan 2006
Posts: 209
Location: Mississippi

PostPosted: Wed Jul 12, 2006 12:57 pm    Post subject: Reply with quote

Okay. For good or ill, then you've made the official "fixed" date memorable - for USians, at any rate. Razz
Back to top
View user's profile Send private message
Search all BlogHarbor support resources.
View previous topic :: View next topic  
Display posts from previous:   
Post new topic   Reply to topic    BlogHarbor Community Forum Index -> Beginner's Lounge All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum