Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nvidia releases Alias-Free GAN code and pre-trained models, naming it StyleGAN3 (github.com/nvlabs)
243 points by polisteps on Oct 11, 2021 | hide | past | favorite | 60 comments


This produces a kind of artefact I haven't seen before, involving little chains of circles and diamonds, e.g. https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-..., https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-... (hair). I think they follow those glowing coordinate-ish lines from the internal representation.

It also seems to have given some faces contact lenses! https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-..., https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-...


Looks like lizard people...

I wonder if you can't just put a bunch of results with and without artifacts into two different bins, and do another round of training on them. But I don't know enough about how style transfer and retraining of these nets and all that modern stuff works to tell if that is feasible.


It already works that way in some sense


I suspect because there's a lot of entropy in hair and because of the shape of the optimization function, (which might even have a spatial term) a regular pattern in such a noisy and hard to learn region falls into a local minimum while the rest of the image converges to the true minimum. There's a little meat left to optimize here, but you need to do it cleverly because there's no reason for a neural network to learn all the many combinations of hair pixels in this application. That could require as many parameters all the neurons involved in generating the faces, I'd bet.


Thinking more about it, the shape of the solution space is sufficiently different for hair vs faces that any given combination of {optimization function, hyperparameters, training data} is unlikely to optimize for both. You probably need some other sort of special tuning, like a spatially local adaptive gradient for regions of hair.


Someone on Twitter pointed out that it has larger repeating patterns too https://twitter.com/Zergfriend/status/1408184663420510209?s=...


Also the some teeth are getting drawn in front of the lips and the pupils are not round. I think the not-round pupils make the eyes point two different directions, kinda unsettling.

https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-...

https://nvlabs-fi-cdn.nvidia.com/stylegan3/images/stylegan3-...


I appreciate the section on "Synthetic image detection":

"While new generator approaches enable new media synthesis capabilities, they may also present a new challenge for AI forensics algorithms for detection and attribution of synthetic media. In collaboration with digital forensic researchers participating in DARPA's SemaFor program, we curated a synthetic image dataset that allowed the researchers to test and validate the performance of their image detectors in advance of the public release. Please see here for more details on detection" https://github.com/NVlabs/stylegan3-detector

It's important to see this sort of thing happening more and more.


> It's important to see this sort of thing happening more and more.

Why? If we insist on the authenticity of images, this is holding on to the old status quo in the same way we apply book and record copyright to digital content. We don't allow what the tech enables to the fullest, but we restrict it by pressing it into the old mold (e.g. by using DRM to make music a commodity).

I think "photographic proof" is a historical accident of the 20st century (and it was never perfect, those with resources could always manipulate pictures to some extent).

As a thought experiment, it might be interesting to imagine what happens when you "open up the dams" and are able to synthesize any image you can imagine! In the beginning, this will cause a lot of trouble (say with harrasment and fake news), but I believe society will adapt quickly. I think right now there is a real problem with the internet remembering too much (pervasive surveillance on the one hand, and constant risk of moral outrages for stupid things you did in your past). It would be an antidote if nobody could believe in any picture anymore.


> but I believe society will adapt quickly

What gives you this impression? How exactly do you believe society would adapt?


Well, I mean we coped before we had cameras, right? In a sense, we would go back to that. We would have to rely more on witnesses we trust than on evidence.

One point of postmodernism is that often the facts (as in what happened when exactly) don't matter as much as the narrative (Unless you are a historian or a scientist). I think this is true to some extent, but we delude ourselves that only the facts matter. And then we can be manipulated by the narrative. It might be interesting to make the narrative explicit.

Imagine a politician that is exposed to a scandal with an old sex tape. This could ruin their carreer, no matter how good their work is otherwise. But if the tape becomes degraded to mere hearsay - maybe it happened, maybe it didn't - then it is about which image we want to believe. Then the sex tape is one narrative, the campaign is another narrative, and the only concrete thing we really have to judge them on is their actual policies, and their actual recent work. All the "image" will be drowned out in noise.

Same if you think about people posting stupid stuff on social media when they are young, and then having trouble when they try to find a job. If it is trivial and ubiquitous to fake drinking pictures and dumb old tweets, you can just shake it off with "oh yeah that is fake". The only thing that will count is your impression and your performance in the moment (and accounts from other trusted people).

I'm not saying this would be a good development, or a bad one, just that I think it is a possible interesting consequence of current tech developments...


>and the only concrete thing we really have to judge them on is their actual policies, and their actual recent work. All the "image" will be drowned out in noise.

Completely disagree. What will instead matter is purely someone's image and their ability to fool people. Those who will benefit most from this aren't good people affected by smear campaigns, it's bad people who can easily avoid real criticism of actual wrong doings.

Most political scandals I know of are not something inconsequential, but rather due to corrupt or truly immoral behavior, like the recent corruption affair in Austria. Making politicians immune to this seems like a drastic step backwards.

>Well, I mean we coped before we had cameras, right?

Well, yeah, humanity also coped with frequent famines. I'm not sure if going to back to something like that would count as 'adapting' to crop failures.


From what I've seen online in the last 5 years, just a mere accusation is sometimes enough to ruin someone's life or career.

At the same time Trump and Johnson both showed that you can literally lie about something you said on live TV a week ago and people will suck it up.


The same way it worked before it had the ability to take pictures.


You can't use any of it commercially. Nothing within is under an acceptable software license (nor an open source license, nor a free software license). Advanced warning.


Just run the whole codebase through OpenAI Codex, then regenerate the source code.

Can't copyright an inferred artifact ;)


You are talking about the models, right? If you train your own model on your own data without transfer learning (or with transfer learning from a liberally licensed third party model once those exist) then you can do whatever you want to with your model, no?


I'm talking about the code. Models are distributed separately.


But you can create models with the code. And then take those models you created and use them commercially. So I don't see a problem.


Nobody will be able to tell if you stick it in a tiny thumbnail and give it enough jpeg artifacts.


Computer-generated artifacts are non-copyrightable. This is not the problem. (That said, as the law is written, no binary should be, but we already threw that baby out with the bathwater.)

The problem is that the software is not free software, but encourages you to stop using its free predecessors and competition sneakily.


Or it’s just a paranoia and they plan to license this to non-free use after a period of testing, leaving non-commercial use for those who want to play with it. That is basically what every non-free software does except the commercial support is not yet provided.

Also, I second the question about free predecessors/competition, not to argue or compare, but out of pure curiosity.


I’m not sure I understand your claim, why is this encouraging anyone to stop using any competition? The license says plainly it’s not for commercial use, what’s so sneaky?

What do you define as free software? This software is open source, always free as in beer, and free as in freedom for research and evaluation purposes (and seems fairly permissive to researchers…)


It by definition is not open source. The term has a definition. This breaks literally the first rule.

https://opensource.org/osd


I used, or maybe misused, the term open source. You used “free”. The license & project used neither, and made no claim to align with opensource.org’s philosophy or definition. Whatever you call it, the source code has been released for anyone to read and “evaluate”, that’s what I meant by ‘open’.

You didn’t answer the question - how is this sneaky, and how does it prevent using previous projects?


The Open Source Initiative coined the term to begin with. Using it incorrectly is harmful, and is how we've ended up with "literally" meaning "figuratively" in modern English. By insisting on the correct definition, I'm trying to prevent the same from happening to open source. It's pretty offensive to act like it's not a big deal to use something so essential to computing freedom in a cavalier way to intentionally lessen freedom.


OSI was not the first to use the phrase "open source". This phraseology was in commonplace use to refer to other types of publicly available material for decades prior to 1998, when OSI decided to use the term to describe software licenses.

One example from 1971: https://www.google.com/books/edition/United_States_Code/3j2P...

There are also other (quite valid) authorities on software licensing other than OSI which have differing opinions on which licenses specifically qualify.

For example: most people would probably agree that BSD was open source, despite OSI's lack of approval on its original license. And I hardly think thats 'harmful' in any way.


Another problem with assuming that a non-commerce clause in the license automatically means software is not open source is that the US government defines commercial software as any software that is licensed to the public, which includes most open source software, even by OSI’s standards.

“in nearly all cases, open source software is considered "commercial software" by U.S. law, the FAR, and the DFARS. DFARS 252.227-7014 specifically defines "commercial computer software" in a way that includes nearly all OSS”

https://dodcio.defense.gov/open-source-software-faq/#Q:_Is_o...


Literally has meant figuratively for hundreds of years, Dickens used it that way. https://www.merriam-webster.com/words-at-play/misuse-of-lite...

There is no “correct” definition of the term “open source”. People use it to mean many things. If it has any license other than “public domain”, then it limits some freedoms in some ways.

You still didn’t back up your claims: what is sneaky, what predecessor does this license prevent use of?


I agree with you, but I don't think anyone from Nvidia called it "open source" (I agree that 'dahart incorrectly did so). It's a shame that GitHub allows non-open-source code, but it does, and nothing else about it implies that it's open-source.


Fwiw, Dave (dahart) currently works for NVIDIA :).


Hang on, that’s purely incidental in this context. I don’t represent this project in any way, and I only called it open source on accident here. Nobody associated with the project has suggested that it’s open source by OSI’s standards.


the word 'free' in English to decribe the software has been, and is problematic. It leads to a lot of heat minus light in conversation, it seems. I support direct GPL software, and, its important to make sure the person you are talking to, is using the same terms to mean the same thing, right away.


Just use the code to build a source-code auto-completion model, then wire the model up to your text editor and write new source code.



Obviously if you're using the GAN equivalent of a Megamind meme you won't get away with it, but more reasonable ones you can.


I worked with these guys at nvidia Helsinki office. They are super chill and just somehow crank out super research. Very interesting bunch.


I know HN doesn't like hype, but as an AI neophyte, I find this incredible. Nvidia is doing it again. This is likely going to help with 3D generation, the next cornerstone. Imagine that we are solving the problems so fast.


There are videos that show what they mean by "details glued to image coordinates" in StyleGAN2: https://nvlabs-fi-cdn.nvidia.com/stylegan3/videos/



I was joking, but it appears that StyleGAN3's new approach allowed it to develop unsupervised 3D maps of faces within the deeper layers, which might result in interesting things when hacked by researchers.


It really bums me out that they didn’t name it GANnamStyle.


And it fits pretty good too. GAN Nvidia Alias-free Models Style


From the license file:

> 3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then your rights under this License from such Licensor (including the grant in Section 2.1) will terminate immediately.

Is such a clause legal? I have basically zero knowledge of such things, but it seems like it should be illegal to punish someone for a good faith patent claim.


As defined in the license, the capitalized term "Work" means only the StyleGAN3 software and derivatives. So it means you can't use StyleGAN3 while simultaneously claiming it infringes one of your patents, but it doesn't mean Nvidia can use StyleGAN3 against you as leverage in an unrelated patent suit.

I'm not a lawyer, and I won't comment on whether this is legal, but I'll note that it's quite similar to the patent clause in section 3 of the Apache Public License.

https://www.apache.org/licenses/LICENSE-2.0


Apache 2.0 has a very similar clause. But there might be some subtle differences in the wording that makes this broader or stricter.


Yes they are legal, and I'm not sure I follow the argument that they shouldn't be


The argument against it, presumably, is that "If you try and make us pay for committing crime, you won't get access to our toys anymore" is very strange and seems illegal, since the ability to play with toys should not stop anyone from reporting violations of the law.

But at the same time, it's definitely legal, for better or for worse, as is pretty much any stunt you pull with the joke that is US IP law.


This is not illegal in any country I can think of. Your take around what it exists to do is, IMHO, over the top.

I'm not aware of a country that requires you let people sue you in this sort of situation, or requires that you not terminate their contract if you do.


Is patent violation a crime? My understanding it is a civil issue like you kick me out of your property, and I'll kick you out too.


It is a civil issue rather than a criminal one, albeit still violation of law, which is why I used "crime" in the same way that the US government calls piracy and copyright infringement a crime despite it generally being a civil offense.


It is legal, yes. That license is awful and proprietary (3.3), but it's most definitely legal.


Everyone is nitpicking the licenses involved in this thread. Is this the right thing to do?


Is minimum 12GB limit on purpose to make people buy new GPUs? It's sad that this growing area is becoming for privileged people only.


It's not on purpose but a direct consequence of the computational requirements of the model.

I say this often and I cannot stress this enough: this is not magic! High quality results require a lot of computation and with it a lot of other resources like VRAM.

This area of research has always been resource intensive and no one was surprised by this back in the 1990s when some research required hardware that wasn't available to mere mortals (looking at you, SGI Indigo and Octane).

You should look at it be the other way around: it's astonishing that you can use off-the-shelf consumer hardware (here: mid-range gfx cards) to participate in and benefit from cutting edge research.

Even better yet, services like Google CoLab even allow you to meddle with this for free if you don't own the required hardware.

BTW, it's not the researchers' fault that the market is f'ed up again due to shortages and crypto mining. Otherwise you'd be able to buy a 12 GB card for around $330 (e.g. the RTX 3060), which doesn't sound particularly outrageous to me.


We need FOSS GPUs + compute!


Unless you discover a novel process to make a chip, economics of semi conductor never can make sense in FOSS. You will end up paying way more, than what the current market price is.


> This material is based upon work supported by the US Defense Advanced Research Projects Agency (DARPA) under Contracts No.R00112030005, HR001120C0123, HR001120C0124 and FA8750-20-2-1004 and the Air Force Research Laboratory (AFRL) under Contract No. FA8750-20-2-1004.

This is why AI is just a marketing term with no real future.

There isn't room for corporations to profit from it out of the gate + into the future indefinitely. No one is going to pay an AWS tax to use their models on every single API hit forever. No one is going to pay nVidia a license fee to use their image recognition tools forever. If the creators of HTML, CSS, and Javascript wanted license fees we wouldn't be using them right now either.

There are two groups of people, off the top of my head, who care about all of this:

1) The US Military, because the budget for their murder robots is theoretically infinite.

2) Google and Facebook because the budget for their spyware is theoretically infinite.

To everyone else, it's much ado about nothing.


Nvidia makes most of their money selling GPUs as computational accelerators-- both generic CUDA and neural network applications.

They don't _need_ to profit off their ML models. It's a value added service. Ensuring their hardware has marketshare at the leading edge of the ecosystem is the main point.


They didn't profit off of ML models, they profited from the DARPA contract. See point above about murder robots.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: