Skip to content

Add to bibliography update a check of every URL if broken vs. accessible.#337

Draft
MattHeffron wants to merge 2 commits into
mainfrom
mth74--Add_check_if_URL_is_broken
Draft

Add to bibliography update a check of every URL if broken vs. accessible.#337
MattHeffron wants to merge 2 commits into
mainfrom
mth74--Add_check_if_URL_is_broken

Conversation

@MattHeffron

Copy link
Copy Markdown
Member

Initial attempt. Too many "false positive" indicating broken when not.
I tried using LWP perl library. It was faster than calling "curl", but that gave even more false positives.
It may be having trouble with redirects.

@stumbo

stumbo commented Jun 19, 2026

Copy link
Copy Markdown
Member

Sounds like the next step is going to be to separate the return values and based on that decide if it's a paywall or a broken link. We'll also need to be careful that we handle 429s correctly and back off on the number of requests we're making if needed.

Might want to add a flag to allow skipping the check - I imagine it takes a bit.

…ning why they're getting errors from curl, but (most) not when entered in a browser.
@MattHeffron

Copy link
Copy Markdown
Member Author

Here's the bibSplit.err and the .hdr files generated by the bibSplit.pl I just committed.
bibSplit-info.zip

With the mostly 403 errors, I had hoped that adding the --referer from the bibliography might help. Nope.
Also, there are 3 SSL certificate errors, 1 DNS failure, and 1 failed to connect to server (possibly just a transient).

@MattHeffron MattHeffron self-assigned this Jun 20, 2026
@MattHeffron MattHeffron added the enhancement New feature or request label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants