I\'ve decided to use the Python logging module because the messages generated by Twisted on std error is too long, and I want to INFO
level meaningful messages such
I know this is old but it was a really helpful post since the class still isn't properly documented in the Scrapy docs. Also, we can skip importing logging and use scrapy logs directly. Thanks All!
from scrapy import log
logfile = open('testlog.log', 'a')
log_observer = log.ScrapyFileLogObserver(logfile, level=log.DEBUG)
log_observer.start()
As of Scrapy 2.3, none of the answers mentioned above worked for me. In addition, the solution found in the documentation caused overwriting of the log file with every message, which is of course not what you want in a log. I couldn't find a built-in setting that changed the mode to "a" (append). I achieved logging to both file and stdout with the following configuration code:
configure_logging(settings={
"LOG_STDOUT": True
})
file_handler = logging.FileHandler(filename, mode="a")
formatter = logging.Formatter(
fmt="%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s",
datefmt="%H:%M:%S"
)
file_handler.setFormatter(formatter)
file_handler.setLevel("DEBUG")
logging.root.addHandler(file_handler)
As the Scrapy Official Doc said:
Scrapy uses Python’s builtin logging system for event logging.
So you can config your logger just as a normal Python script.
First, you have to import the logging module:
import logging
You can add this line to your spider:
logging.getLogger().addHandler(logging.StreamHandler())
It adds a stream handler to log to console.
After that, you have to config logging file path.
Add a dict named custom_settings
which consists of your spider-specified settings:
custom_settings = {
'LOG_FILE': 'my_log.log',
'LOG_LEVEL': 'INFO',
... # you can add more settings
}
The whole class looks like:
import logging
class AbcSpider(scrapy.Spider):
name: str = 'abc_spider'
start_urls = ['you_url']
custom_settings = {
'LOG_FILE': 'my_log.log',
'LOG_LEVEL': 'INFO',
... # you can add more settings
}
logging.getLogger().addHandler(logging.StreamHandler())
def parse(self, response):
pass
It is very easy to redirect output using: scrapy some-scrapy's-args 2>&1 | tee -a logname
This way, all what scrapy ouputs into stdout and stderr, will be redirected to a logname file and also, prited to the screen.
You want to use the ScrapyFileLogObserver.
import logging
from scrapy.log import ScrapyFileLogObserver
logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()
I'm glad you asked this question, I've been wanting to do this myself.
For all those folks who came here before reading the current documentation version:
import logging
from scrapy.utils.log import configure_logging
configure_logging(install_root_handler=False)
logging.basicConfig(
filename='log.txt',
filemode = 'a',
format='%(levelname)s: %(message)s',
level=logging.DEBUG
)