La nuova BBS è in fase Alpha. I post precedenti al 22 luglio 2024 potrebbero non essere trasferibili, ma rimarranno disponibili per la lettura su /old/.
Today in web crawler development:
-
Today in web crawler development:
- XHTML is now supported as a webpage type
- Links to blocked domains are properly skipped instead of throwing errors
- I rolled my own sitemap parser because the library I had used was slooooooow and just using lxml is so much faster, particularly with such a simple task
Only six
TODO
comments remaining on the crawler! -
@amin XHTML pages still exist? :O