[ot] log filtering

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ot] log filtering

Marc Pope-3
Does anyone know of a way to filter a large web log file to just filter
robots? I want to generate a log file of only spiders to my site.

Thanks
Marc



--
------------------------------
Lasso Support: http://support.omnipilot.com/
Search the list archives: http://www.listsearch.com/lassotalk.lasso
Manage your list subscription:  
http://www.listsearch.com/lassotalk.lasso?manage
Reply | Threaded
Open this post in threaded view
|

Re: [ot] log filtering

cJJUNnH41s90Y
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: [ot] log filtering

Marc Pope-3
In reply to this post by Marc Pope-3
Thanks! Very helpful tip. I am using Apache, OS X and Combined format.

Marc


On Jun 1, 2005, at 8:29 PM, noah williamsson wrote:

> [hidden email] skrev:
>> Does anyone know of a way to filter a large web log file to just
>> filter
>> robots? I want to generate a log file of only spiders to my site.
>> Thanks
>> Marc
>
> Assuming you're on Mac OS X, using Apache and have a logfile from
> Apache's combined logformat, something similar to this should work:
>
>   awk -F\" '$6~/^Agent spider|^Googlebot/{print$0}' my_combined_log
>
> This tells awk to use '"' as a field seperator and then match (the
> tilde) field 6 ($6) against the regexp between the slashes. The regexp
> matches lines beginning  with (that's what the circumflex does) "Agent
> spider" or (the pipe) "Googlebot".
> print $0 simply means "print the whole line".
>
>
> ..or you could "cheat" with egrep and try something like
>
>   egrep "Agent spider|Googlebot|blabla spider" my_combined_log
>
> Watch out though, Googlebot is present inside Mozilla UA strings too
> which might give you false positives ..
>
>
> A list of robot UA strings is probably available somewhere on Google.
>
>   -- noah
>
> --
> ------------------------------
> Lasso Support: http://support.omnipilot.com/
> Search the list archives: http://www.listsearch.com/lassotalk.lasso
> Manage your list subscription:  
> http://www.listsearch.com/lassotalk.lasso?manage


--
------------------------------
Lasso Support: http://support.omnipilot.com/
Search the list archives: http://www.listsearch.com/lassotalk.lasso
Manage your list subscription:  
http://www.listsearch.com/lassotalk.lasso?manage