Sure, third-party services like the OP can provide bots that can scan. But if you create an ecosystem in which PRs can be submitted by threat actors, part of your commitment to the community should be to provide visibility into attacks that cannot be seen by the naked eye, and make that protection the norm rather than the exception.
[0] https://docs.github.com/en/get-started/learning-about-github...
The article is about in JavaScript, although it can apply to other programming languages as well. However, even in JavaScript, you can use \u escapes in place of the non-ASCII characters. (One of my ideas in a programming language design intended to be better instead of C, is that it forces visible ASCII (and a few control characters, with some restrictions on their use), unless you specify by a directive or switch that you want to allow non-ASCII bytes.)
The mere fact that a software maintainer would merge code without knowing what it does says more about the terrible state of software.
Sure the payload is invisible (although tbh im surprised it is. PUA characters usually show up as boxes with hexcodes for me), but the part where you put an "empty" string through eval isn't.
If you are not reviewing your code enough to notice something as non sensical as eval() an empty string, would you really notice the non obfuscated payload either?
Things that vanish on a printout should not be in Unicode.
Remove them from Unicode.
Innocuous PR (but do note the line about "pedronauck pushed a commit that referenced this pull request last week"): https://github.com/pedronauck/reworm/pull/28
Original commit: https://github.com/pedronauck/reworm/commit/df8c18
Amended commit: https://github.com/pedronauck/reworm/commit/d50cd8
Either way, pretty clear sign that the owner's creds (and possibly an entire machine) are compromised.
For data or code hiding the Acme::Bleach Perl module is an old example though by no means the oldest example of such. This is largely irrelevant given how relevant not learning from history is for most.
Invisible characters may also cause hard to debug issues, such as lpr(1) not working for a user, who turned out to have a control character hiding in their .cshrc. Such things as hex viewers and OCD levels of attention to detail are suggested.
Is there ever a circumstance where the invisible characters are both legitimate and you as a software developer wouldn't want to see them in the source code?
Are people using eval() in production code?
I am wondering how that they've LLM, are people using them for making new kind of malicious codes more sophisticated than before?
My clawbot & other AI agents already have this figured out.
/s