The best answer on StackOverflow: Using RegEx to parse HTML

BaumGeist@lemmy.ml · 2 months ago

The best answer on StackOverflow: Using RegEx to parse HTML

HStone32@lemmy.world · 2 months ago

SO in a nutshell:

“I need to do X”

“Have you tried Y?”

“No, because I don’t need Y, I need X.”

“Well you can do Z if you can’t do Y.”

“OK, sure. But how do I do X?”

“Why do you need to do X?”

(Explains why in my hyper-specific situation, I need to do X, and Y and Z won’t work)

This question has been marked as a duplicate of “How to do Y”

winterayars@sh.itjust.works · 2 months ago

I would say it’s more like: “How can I do X?” “Here are some reasons you can’t do Y.”

The answers should have been “Here are some reasons doing X is hard, but here’s an attempt at it anyway and also some more robust alternatives to doing X.” That would have been an excellent answer. (If you go down far enough you do start to see things like this but they’re hindered by people still responding that you can’t do Y or downvoting because they don’t understand what’s happening.)

interdimensionalmeme@lemmy.ml · 2 months ago

Always start SO questions with X/Y problem pre-empting

These people are everywhere and will stop at nothing to make you click on one of these

https://xyproblem.info/ https://news.ycombinator.com/item?id=34444353 https://en.wikipedia.org/wiki/XY_problem

They are trying to derail your question, which was already a generalized version of what your actual question was. And of course, you would need to explain everything you generalized out of your question (which would probably all get deleted by someone editing your question and removing all the irrelevant facts) by which point your question becomes so complicated nobody can answer it, even though they could have answered the generalized version.

My advice, just use chatgpt or mistral, 99% you will get a better answer than stackoverflow. And you will get this actionnable answer IMMEDIATELY !

communism@lemmy.ml · 2 months ago

OP isn’t trying to parse HTML though… they are trying to detect opening xml tags. Which seems quite achievable with regex.

winterayars@sh.itjust.works · edit-2 2 months ago

It’s still actually pretty sketchy, depending on exactly what you want to do. Strict regex still won’t be able to match correctly if you want to match what an HTML parser considers the opening tag, though fancier regex will. If you’re just looking for the tags in the HTML document as a flat document it’s doable, though. (Mostly.)

solrize@lemmy.world · edit-2 2 months ago

There is a famous Erik Naggum rant about XML at, no wait, I better not link it but you can find it with a search engine if you want, which means you don’t get to complain to me about it since you are the one who went looking for it. Very NSFW and VERY politically incorrect. Naggum died in 2009 but anyone who published a thing like that today would be raked over the coals.

Nariom@lemmy.world · 2 months ago

I once applied to an internship for a company doing job offers aggregation. During the interview they explained to me that the core of what they did was parsing (partial) html with regex. When I asked why they wouldn’t develop a custom parser, they replied to me that they were working on it, but that the internship wouldn’t focus on that. I was not disappointed when it didn’t get the job.

fluckx@lemmy.world · 2 months ago

So all the misery in the world is related to webdevs trying to parse html with regex?

You bastards.

fubo@lemmy.world · 2 months ago

Once you learn about parser combinators, all other parsing looks pretty dopey.

The best answer on StackOverflow: Using RegEx to parse HTML

The best answer on StackOverflow: Using RegEx to parse HTML

RegEx match open tags except XHTML self-contained tags