I love everything about the move to Apple Silicon with the exception of the deci...

sudhirj · on Nov 17, 2020

One, this is repeating the iPhone vs Android comparisons. iPhones with 4GB RAM feel faster and get more work done than Androids running Qualcomm ARM with double that amount of RAM. The faster IO becomes the cheaper paging becomes, and macOS and iOS have a lot of work done to handle paging well.

Two, this is the entry level processor, made for the Air, which is what we get for students, non-technical family members and spare machines. Let’s see what the “pro” version of this is, the M1X or whatever. We already know this chip isn’t going to go as is into the 16 inch MacBook Pro, the iMac Pro or the Mac Pro. I’d like to see what comes in those boxes.

cactus2093 · on Nov 17, 2020

I get what you're saying, I'm also looking forward for the even higher performing machines with 12 or 16 core cpus (8 or 12 performance cores + the 4 efficiency cores?), 32gb ram option, 4 thunderbolt lanes, and more powerful gpus. Wondering exactly how far apple can push it, if this is what they can do in essentially a ~20TDP design.

On the other hand it's quite funny that the title of this article is "16-inch MBP with i9 2x slower than M1 MacBook Air in a real world Rust compile" and the comments are still saying "yeah but this is entry level not pro".

Apparently Pros are more concerned about slotting into the right market segment than getting their work done quickly :)

aunetx · on Nov 17, 2020

I may be wrong, but the ecosystem does not really change here right? I mean, memory management should be roughly the same between x86_64 and arm regarding the amount of ram used, so I guess 16gb of ram under old macbooks is the same as 16gb under the new ones

sudhirj · on Nov 17, 2020

All else being equal, yes, but the memory is faster, closer to the chip, has less wiring to go through, and because of vertical integration they can pass by reference instead of copying values internally on the hardware. The last one is big - because all the parts of the SoC trust each other and work together they can share memory withing having to copy data over the bus. That coupled with superfast SSDs means that comparisons aren't Apples to oranges, excuse the pun.

16GB of memory on-die shared by all the components of an SoC is not the same as 16GB made available to separate system components, each of which will attempt to jealously manage their own copy of all data.

kcartlidge · on Nov 17, 2020

I'm not a hardware person, but I do software for a living. Your comment makes things much clearer.

You're saying that the effective difference in having the shared memory is that you get more data passed by reference and not by value at the lower levels?

If that's true, then you get extra throughput by only moving references around instead of shuffling whole blocks of data, and you also gain better resource usage by having the same chunks of allocated memory being shared rather than duplicated across components?

sudhirj · on Nov 18, 2020

That’s how I understand it, yes. I’m not into hardware either, going by engineering side of the event. In the announcement, there some parts shot at the lab/studio, where the engineers explain the chip. Ignore the marketing people with their unlabelled graphs, the engineers explain it well.

But yes, they’re basically saying because this is “unified memory”, there’s no copying. No RAM copies between systems on the SoC, no copies between RAM and VRAM, etc. because the chips are working together, they put stuff on the RAM in formats they can all understand, and just work off that.

andrekandre · on Nov 17, 2020

> because of vertical integration they can pass by reference instead of copying values internally on the hardware.

got any links about that?

sudhirj · on Nov 18, 2020

Going by the engineering explanations in the announcement video. See the segments shot in the “lab” set. They’re actually pretty proud of this and are explaining the optimisations quite candidly.

andrekandre · on Nov 18, 2020

interesting, thanks

wmf · on Nov 17, 2020

"Controlling the ecosystem" and "integration" and such are just wishful-thinking rationalizations. Chrome and Electron will use however much RAM they use; Apple can't magically reduce it. If you need 32GB you need 32GB.

diebeforei485 · on Nov 17, 2020

Slack (probably the most popular Electron app) has confirmed they are going native on Apple Silicon: https://twitter.com/SlackEng/status/1326237727667314688?s=20

wmf · on Nov 17, 2020

This is still Electron, no?

zerkten · on Nov 17, 2020

My guess is that it's an ARM build of Electron - unless they've been working to bring the iOS version over? That would be a huge win.

Even if this is Electron, I suspect this still great news for anyone that needs Slack. The Rosetta 2 performance of Electron would likely be a dog and Slack is a very high profile app with a lot of visibility.

sudhirj · on Nov 18, 2020

Yeah, that’s partly true. Applications that allocate 1000GB will need to get what they ask for. No getting around bad applications. The benefits are more in terms of lower level systems communicating by sharing memory instead of sharing memory by communicating, which is always faster and needs less memory, but needs full trust and integration.

ogre_codes · on Nov 17, 2020

> You can't get an M1 configuration right now larger than 16GB which is a table-stakes baseline dev requirement today.

Everyone on my team has been using 15" MacBook Pros with 16GB RAM for the past 3 years. I suspect most developers run with 16GB of RAM just fine.

I'm not arguing "16GB is fine for all developers everywhere!", but it's absolutely not a hard requirement. I suspect for a lot of us, the difference in performance between 16GB and 32GB is trivial.

Regardless, the thing which is kind of stunning about this chip is that they are getting this kind of performance out of what is basically their MacBook Air CPU. Follow on CPUs—which will almost certainly support 32GB RAM—will likely be even faster.

JimDabell · on Nov 17, 2020

> Regardless, the thing which is kind of stunning about this chip is that they are getting this kind of performance out of what is basically their MacBook Air CPU.

Or to put it a different way: this is the slowest Apple Silicon system that will ever exist.

Someone · on Nov 17, 2020

Laptop or desktop: likely, but even if the next Apple Watch will be faster, which I doubt, their smart speakers and headphones probably can do with a slower CPU for the next few years.

mft_ · on Nov 17, 2020

Is there a name for this trait of bringing unnecessary precision to a discussion, I wonder?

I mean, contextually it’s obvious that the previous poster meant this is the slowest Apple Silicon that will ever exist in a relevant and comparable use case - i.e. a laptop or desktop. And the clarification that yes, slower Apple Silicon may exist for other use cases didn’t really add value to the discussion.

And I’m not even being snide to you - I’m genuinely interested whether there’s a term for it, because I encounter it a lot - in life, and in work. ‘Nitpicking’ and ‘splitting hairs’ don’t quite fit, I think?

fpgaminer · on Nov 18, 2020

I don't have a name for it, but I agree that it should have a name. It's a fascinating behavior. I nitpick all the time, though I don't actually post the nitpicks unless I really believe it's relevant. Usually I find such comments to be non-productive, as you mention.

And yet, even though I often believe nitpicks to be unnecessary parts of any discussion, I also believe there is a certain value to the kind of thinking that leads one to be nitpicky. A good programmer is often nitpicky, because if they aren't they'll write buggy code. The same for scientists, for whom nitpicking is almost the whole point of the job.

It's just an odd duality where nitpicking is good for certain kinds of work, but fails to be useful in discussions.

DeRock · on Nov 18, 2020

It sounds like the word you’re looking for is ‘pedantic’.

bocklund · on Nov 17, 2020

Maybe "overparticular"

JimDabell · on Nov 18, 2020

Everything I have seen from Apple talks about Apple Silicon as the family of processors intended for the Mac, with M1 as the first member of that family.

I know other people have retroactively applied the term “Apple Silicon” to other Apple-designed processors, but I don’t think I’ve seen anything from Apple that does this. Have you?

whalesalad · on Nov 17, 2020

I think if you have a very specific role where your workload is constant it makes sense. I am an independent contractor and work across a lot of different projects. Some of my client projects require running a full Rails/Worker/DB/Elasticsearch/Redis stack. Then add in my dev tools, browser windows, music, Slack, etc... it adds up. If I want to run a migration for one client in a stack like that and then want to switch gears to a different project to continue making progress elsewhere I can do that without shutting things down. Running a VM for instance ... I can boot a VM with a dedicated 8GB of ram for itself without compromising the rest of my experience.

That is why I think 16GB is table stakes. It is the absolute minimum anyone in this field should demand in their systems.

Honestly the cost of more RAM is pretty much negligible. If I am buying laptops for a handful of my engineers I am surely going to spend $200x5 or whatever the cost is once to give them all an immediate boost. Cost/benefit is strong for this.

mixmastamyk · on Nov 17, 2020

All of this is doable in 16GB, I do it everyday with a 3.5GB Windows 10 VM running and swap disabled. There are many options as well such as closing apps and running in the cloud.

ogre_codes · on Nov 17, 2020

Update: Re-reading your above comment I realized I mis-read your post and though you were suggesting 32GB was table-stakes... which isn't quite right. Likewise much of below is based on that original mis-read.

I'm not convinced that going from 16GB to 32GB is going to be a huge instant performance boost for a lot of developers. If I was given the choice right now between getting one of these machines with 16GB and getting an Intel with 32GB, I'd probably go with the M1 with 16GB. Everything I've seen around them suggests the trade-offs are worth it.

Obviously we have more choices than that though. For most of us, the best choice is just waiting 6-12 months to get the 32GB version of the M? series CPU.

danaris · on Nov 17, 2020

I've seen others suggest that 32GB is table-stakes in their rush to pooh-pooh the M1.

I, personally, am a developer who has gone from 16GB to 32GB just this past summer, and seen no noticeable performance gains—just a bit less worry about closing my dev work down in the evening when I want to spin up a more resource-intensive game.

nthj · on Nov 17, 2020

I agree with this. I don't think I could argue it's table stakes, but having 32GB and being able to run 3 external monitors, Docker, Slack, Factorio, Xcode + Simulator, Photoshop, and everything else I want without -ever- thinking about resource management is really nice. Everything is ALWAYS available and ready for me to switch to.

tempest_ · on Nov 17, 2020

At some point it is easier to have something sitting in a rack somewhere. That way you dont have to cook your ultrabook to run that stuff.

ogre_codes · on Nov 17, 2020

People have been saying this kind of thing for years, but so far it doesn't really math out.

Having a CPU "in the cloud" is usually more expensive and slower than just using cycles on the CPU which is on your lap. The economics of this hasn't changed much over the past 10 years and I doubt it's going to change any time soon. Ultimately local computers will always have excess capacity because of the normal bursty nature of general purpose computing. It makes more sense to just upscale that local CPU than to rent a secondary CPU which imposes a bunch of network overhead.

There are definitely exceptions for things which require particularly large CPU/ GPU loads or particularly long jobs, but most developers will running local for a long time to come. CPUs like this just make it even more difficult for cloud compute to be make economic sense.

imtringued · on Nov 18, 2020

As someone who is using a CPU in the cloud for leisure activities this is spot on. Unless you rent what basically amounts to a desktop you're not going get a GPU and high performance cores from most cloud providers. They will instead give you the bread and butter efficient medium performance cores with a decent amount of RAM and a lot of network performance but inevitable latency. The price tag is pretty hefty. After a few months you could just buy a desktop/laptop system that fits your needs much better.

wil421 · on Nov 17, 2020

Larry Ellison proposed a thin client that was basically a dumb computer with a monitor and nic that connected to a powerful server in the mid 1990s.

For a while we had a web browser which was kinda like a dumb client connected to a powerful server. Big tech figured out they could push processing back to the client by pushing JavaScript frameworks and save money. Maybe if arm brings down data center costs by reducing power consumption we will go back to the server.

whycombagator · on Nov 17, 2020

What kind of development are you doing?

16gb is OK for my needs at home running linux, but on the odd occasion I wish I had more.

At work I find 32Gb is barely enough.

ogre_codes · on Nov 17, 2020

I would turn that around. What kind of development are you doing where you feel 32GB is "Barely enough"?

Right now I primarily work on a very complex react based app. I've also done Java, Ruby, Elixir, and Python development and my primary machine has never had 32GB.

More RAM is definitely better, but when I hear phrases like "32GB is barely enough", I have to wonder what in the hell people are working on. Even running K8s with multiple VMs at my previous job I didn't run into any kind of hard stops with 16GB of RAM.

bronson · on Nov 17, 2020

One data point: when I was consulting a year ago, I had to run two fully virtualized desktops just to use the client's awful VPNs and enterprise software. Those VMs, plus a regular developer workload, made my 16GB laptop unusable. Upgrading to 32GB fixed it completely.

mixmastamyk · on Nov 17, 2020

Desktops can use less memory than folks many folks think. I have a VM of Windows 10 in 3.5 GB running a VPN, Firefox, Java DB app, and ssh/git. For single use, memory could be decreased.

I think the art of reducing the memory footprint has been lost. Whenever I configure a VM for example, I disable/remove all the unused services and telemetry as the first step. This approaches an XP memory footprint.

imtringued · on Nov 18, 2020

That's not what this discussion is about. 16GB is definitively limiting if you run VMs but 32GB should be plenty. If you need more then either you are running very specialized applications which means your own preferences are out of touch with the average developer or you are wasting all of the RAM on random crap.

thom · on Nov 17, 2020

If you do machine learning or simulations with big datasets and lots of parameters it does become an issue, but I will admit I could just as easily run these things on a server. I don’t think I’ve ever maxed out 32gb doing anything normal.

mixmastamyk · on Nov 17, 2020

> What kind of development are you doing?

Sounds like folks never want to close an app. It could be a productivity booster if you want to spend the money and electricity, but is rarely a requirement.

DonaldPShimoda · on Nov 17, 2020

Keep in mind that so far Apple has only started offering M1 on what are essentially entry-level computers. I think it's likely there will be a 32GB Unified Memory version for the 16" MBP (which maybe will become available on the 13" or Mac Mini too).

I think M1 would not be able to achieve the performance and efficiency improvements if the RAM were not integrated, so they'll stick with Unified Memory for the time being. I don't think this will be as tenable for the Mac Pro (and maybe not even iMac Pro), but those are probably much further from Apple Silicon than anything else, so we'll see what happens.

whalesalad · on Nov 17, 2020

I agree with you completely. I am looking forward to the next offering and hope that they have a plan for more memory.

In the meantime I wonder if they are going to do dual (or more) socket configurations. I was just thinking to myself imagine a Mac Pro with 8 of these M1 chips in it all cooled by one big liquid heat block. That thing would rip.

DonaldPShimoda · on Nov 18, 2020

> hope that they have a plan for more memory.

I can't imagine them not doing it. If they were satisfied that 16GB was sufficient, I would've expected them to also refresh the 16" MBP with M1. I think the fact that they didn't is a good indicator that something about M1 isn't ready for the big boy league, and my guess is RAM will factor into that.

My guess is that the second generation (M2?) will improve performance with little efficiency gain and will include up to 32GB "Unified Memory". And then binning will be used to produce the 16GB and 8GB variants.

> I wonder if they are going to do dual (or more) socket configurations.

Whoa, that's something I hadn't thought of! I wonder if M1 is amenable to that kind of configuration. That would be pretty neat!

arpigartigar · on Nov 23, 2020

(This should be a reply to an older comment of yours and I realise it's probably bad form to be posting it here, but I couldn't find any other way to contact you)

A few weeks ago you made a comment (https://news.ycombinator.com/item?id=24653498) where you mentioned a PL Discord server. Could I get an invite? I can be reached through aa.santos at campus.fct.unl.pt if you'd rather not discuss it in public/if you'd like to verify my identity.

Sorry to everyone else in the thread for being off-topic.

DonaldPShimoda · on Nov 24, 2020

Hello! Yes, HN's lack of notifications really poses a problem in situations like this. Sigh.

However, the answer to your question is fortunately a simple one: the Discord server mentioned is run by the /r/ProgrammingLanguages community over on Reddit [0]! If you go to that page (might need to be on a desktop browser because ugh) and look in the sidebar/do a search for "Discord server", you'll find a stable invite link.

Alternatively, I can just provide you with the current link [1] and note that it may not work forever (for anybody who finds this comment in the future).

[0] https://www.reddit.com/r/programminglanguages/

[1] https://discord.gg/yqWzmkV

ddingus · on Nov 17, 2020

Seems to me, off chip RAM becomes a new sort of cache.

If Apple sizes the on chip RAM large enough for most tasks to fly, bigger system RAM can get paged in and performance overall would be great, until a user requires concurrent performance exceeding on chip RAM.

DonaldPShimoda · on Nov 18, 2020

Hmm that's a good point!

The thing I worry about is that the whole appeal of the Mac Pro is upgradeability — you can replace components over time. So integrated RAM would be problematic since that's a component people definitely like to upgrade.

But with your idea... I dunno, if they could pull that off that would be super cool!

ddingus · on Nov 18, 2020

Yeah I think so too. If they can execute from off chip Ram, perhaps with a wait state or whatever it takes, for a ton of use cases no one will even notice.

It will all just effectively be large RAM.

Doing that coupled with a fast SSD, and people could be doing seriously large data work on relatively modest machines in terms of size and such.

A very simple division could be compute bound code ends up being on chip RAM, I/O bound code of any kind ends up in big RAM, off chip.

Doing just that would rock hard.

bufferoverflow · on Nov 17, 2020

Entry level? Did you see their prices?

DonaldPShimoda · on Nov 18, 2020

I mean, as far as Apple computers go... yeah, these are absolutely entry level. And $700 (new M1 Mac Mini starting price) is really pretty reasonable even compared to other options, honestly.

gumby · on Nov 17, 2020

> memory on-die or in-package (not sure how it is configured). They call it 'Unified Memory'

It’s in the package; RAM on the die is called “cache”.

“Unified memory” has nothing to do with packaging. It’s the default for how computer memory has worked since, well, the 1950s: all the parts talk to the same pool of memory (and you can DMA data for any device).

That’s why the term of art for, say, GPUs having their own memory, is called NUMA (“Non-Uniform Memory Access”): unified is the default.) *

M1 is a remarkable chip and Apple doesn’t claim that UFA is some invention: they just used the technical term, just as they say their chips use doped silicon gates. It has become unusual these days and worth their mentioning, but it’s simply ignorance by the reporters that elevated it to seem like some brand name.

* https://en.wikipedia.org/wiki/Non-uniform_memory_access

Symmetry · on Nov 17, 2020

In a NUMA system all the memory is in the same address space even if it's faster for a core to talk to some places than other. Traditionally GPUs work in a completely different address space and doesn't use virtual memory. Yes you can DMA to it but if you DMA a pointer it will break on the other end.

heisenbergs · on Nov 17, 2020

According to anandtech the memory throughput is off the charts at 68.25GB/s [1]. That's twice as fast as high speed ddr4 memory (DDR4-4000 at 32GB/s).

In other words: they totally trounced and took it to the next level with regards to memory, because they can. If anything, memory control is their biggest advantage. Scaling the amount of ram won't be an issue. Increasing the bandwidth perhaps, but it'll still be way quicker than what Intel or AMD offer. This seems like something their next gen M2 version could tackle as a somewhat low hanging fruit.

[1] https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

wtallis · on Nov 17, 2020

It's twice the speed of one module of high-speed DDR4 memory. Mainstream consumer PC platforms all support dual-channel memory. Dual-channel DDR4-4266 would provide the same theoretical bandwidth as the M1's 128-bit wide collection of LPDDR4X-4266.

Intel's LPDDR support has been lagging far behind what mobile SoCs support (largely because of Intel's 10nm troubles), but their recently-launched Tiger Lake mobile processors do support LPDDR4X-4266 (and LPDDR5-5400, supposedly).

Symmetry · on Nov 17, 2020

Just to be pedantic DDR4-4266 is non-standard and so won't be found in any mainstream OEM's laptops. LPDDR4X-4266, soldered to the board instead of socketed and with a lower voltage, is indeed an official thing though.

wtallis · on Nov 17, 2020

Right, JEDEC standards for DDR4 only go up to 3200. But 4266 is within the range of overclocking on desktop systems and only a few percent faster than the fastest SODIMMs on the market, so it's at least somewhat useful as a point of comparison.

LPDDR memories are developed with more of a focus on per-pin bandwidth than standard DDR because mobile devices are more constrained on pin count and power. But Apple's now shipping an LPDDR interface that's just as wide as the standard for desktops, and reaping the benefits of the extra bandwidth.

masklinn · on Nov 17, 2020

FWIW the latest-gen consoles supposedly have memory bandwidth in the hundreds of GB/s. The PS5 supposedly reaches 448GB/s, and the XBX 336 to 560 depending on the memory segment.

redisman · on Nov 17, 2020

It's the year of system-on-chip. It does show great improvements at the trade-off cost of no expansions possible once the chip is made.

masklinn · on Nov 17, 2020

None of these systems seem to have on-die memory. Not sure about the XBS, but from the official teardown the PS5 doesn't even use on-package memory: https://www.gamereactor.eu/media/87/_3278703.png

And Apple certainly didn't wait for SoC to use soldered memory.

wmf · on Nov 17, 2020

Tiger Lake appears to have the same bandwidth at the chip level but AnandTech is using inconsistent methodology so it's not clear.

sfblah · on Nov 17, 2020

Isn't that just because all the memory is on the CPU?

zerkten · on Nov 17, 2020

Yes, but as other threads commented, this is not particularly new. It's similar to how games consoles have been designed.

It's a natural evolution, especially when MacBook Airs and the ilk are not really user upgradeable in the Intel form anyway. It's much harder for regular PCs to make this leap because one party doesn't have as much control.

masklinn · on Nov 17, 2020

> They call it 'Unified Memory'. It makes a lot of sense but I don't know if they are going to be able to pack enough memory in there.

Unified memory and on-die (or in-package) memory are different thing, and while the latter simplifies the former they're mostly orthogonal.

Unified memory means the physical and logical memory space is directly accessible by both CPU and GPU, at the same time, in their entirety: https://en.wikipedia.org/wiki/Heterogeneous_System_Architect...

pwinnski · on Nov 17, 2020

Putting the memory into the SoC seems to be a lot of how they're getting this amazing performance: the memory bus is now twice as wide as anything else.

But yes, it means that Apple is going to have to either really start jamming more and more into there, or develop a two-tiered approach to memory where slower "external" RAM can supplement the faster "internal" RAM.

wmf · on Nov 17, 2020

The memory bus is now the same width as every other laptop.

kemiller · on Nov 17, 2020

My guess is that they are releasing the M1 right now because it's the smallest SoC they're going to make and its yields are just barely enough to be viable. Once they get better yields on the 5nm process, they will start making the larger, more yield-sensitive SoCs with more RAM in them.

zeusk · on Nov 17, 2020

The ram is not on the same chip/die as M1, it's another chip that's put in the same package. Increasing RAM will not affect apple's yields at all since the RAM dies are not made by apple and are not 5nm

kemiller · on Nov 17, 2020

Oh, I see. It's still nevertheless likely that stronger chip packages will be bundled with more RAM.

jeffbee · on Nov 17, 2020

I don't think that makes any sense, considering that the RAM is not on the same die as the processor. It's on the module, yes, but they're not making it on the same process.

I realize this image is a schematic representation rather than an actual photograph, but here it is.

https://www.apple.com/v/mac/m1/a/images/overview/chip__fffqz...

psds2 · on Nov 17, 2020

The point being made is that the processors that they will package with more memory are going to exist on larger dies. When you increase die size you decrease yield so you need to have a mature process.

scottlamb · on Nov 17, 2020

> The point being made is that the processors that they will package with more memory are going to exist on larger dies.

Are they? I don't think this is how AMD does things—all their desktop and Threadripper processors are constructed out of 8-core chiplets. The higher-core count processors just use more chiplets per package, not necessarily larger dies. If Apple's already putting multiple chiplets on one package (core + RAM), I wonder if they'll use the same approach to scaling.

jeffbee · on Nov 17, 2020

Why does that follow? 4 cores and 32GB makes fine sense.

psds2 · on Nov 17, 2020

But why would you offer your i3 with 32GB when you know you are going to make i5 and i7 processors soon? Apple could offer 32GB here but choose to not offer every configuration at every level.

zeusk · on Nov 17, 2020

It's actually 8 core, 4 big (Fire) and 4 small (Ice)

a1369209993 · on Nov 17, 2020

> this image is a schematic representation rather than an actual photograph

If so, it's a incredibly shitty schematic. 0/10, would flunk any draftsman who turned this in for a class.

TylerE · on Nov 17, 2020

I wonder if the CPU can support mixed RAM.

E.g. have the on-die 8GB of "fast" ram, and then support 2 external DIMMs or something for "overflow", file caching, etc.

whalesalad · on Nov 17, 2020

That is a great idea. I think the software is going to be the hard part. You would need some kind of heuristic or software to manage moving memory between those two locations. That is just my initial thought I could be totally wrong.

trillic · on Nov 17, 2020

Every modern general purpose computer already has multiple layers of memory. This would just be an additional layer. The virtual memory subsystems in the OS will handle this. At the end of the day it's just caches all the way down. A workstation with 16GiB of "on-chip" memory would be like a huge L4 cache for the say 512GiB of "standard" DDR4.

I really like OSTEP's chapters on virtual memory if you're interested in reading more[1].

[1] http://pages.cs.wisc.edu/~remzi/OSTEP/vm-complete.pdf

nakkijono · on Nov 17, 2020

Yet, is it a requirement in a memory hierarchy to copy the lower level in the upper one? Like, all the stuff in you L1 cache has to be also in your L2 cache. E.g. if you would have 32gb of external DDR it would add only 16gb more to the packaged 16gb?

scottlamb · on Nov 17, 2020

It's definitely not, and in fact swap on a modern OS doesn't work that way. Something can be only in physical RAM, only in swap, or (when "clean"/unmodified vs the copy in swap) in both (allowing it to be dropped from physical RAM quickly). So one obvious approach would be to simply use the external RAM as swap.

Relatedly, I've heard of using Intel Optane as "slow RAM" for cold pages, and I think the idea there is also that it'd be in one or the other but not both. (Optane can be thought of as very expensive/fast flash or very slow/cheap RAM.)

sveiss · on Nov 17, 2020

It's not a requirement, and different systems in the past have made different choices.

https://en.wikipedia.org/wiki/Cache_inclusion_policy

whalesalad · on Nov 17, 2020

Thanks for the link, downloaded to read this evening :thumbsup:

usefulcat · on Nov 17, 2020

Pretty sure this is already a thing for NUMA systems, e.g. an Intel system with a pair of N-core processors. Each processor gets "its own" half of memory. Memory which belongs to the other processor is slower to access.

See also Linux cpusets (cset command), which can be used to control which NUMA nodes a process has access to.

quasse · on Nov 17, 2020

Luckily that's a problem that academics have been working on since the 1960s. https://en.wikipedia.org/wiki/Non-uniform_memory_access

You'd be surprised how many modern OSes are at least partially NUMA aware. Even Java already has a non-uniform aware allocator.

codys · on Nov 17, 2020

macos already has capabilities for compressing memory used by applications before finally giving up and paging it out. It seems plausible they could extend the code that supports that feature to push memory to external (or allocate it there in the first place if utilization isn't expected to be hot), and only pull it back after utilization indicates there is a benefit.

kemiller · on Nov 17, 2020

The same logic they already use for L1/L2 caching, and/or VM could pretty easily be adapted.

mutatio · on Nov 17, 2020

Maybe the memory controller could handle this, spilling onto the slower memory, much like swapping.

minimalist · on Nov 17, 2020

Perhaps this will encourage programmers to write true native applications again instead of wrapping web scripting languages in a browser and calling it a day.

yborg · on Nov 17, 2020

I'd say the opposite, this increases the speed of such apps to closer to what is now native speed.

minimalist · on Nov 17, 2020

I'm referring to the second point about memory that OP made. You know, having to allocate 700MB of memory in order to run an electron chat application.

imtringued · on Nov 18, 2020

This is more of a failing of the runtime that Electron uses and how it (ab)uses that runtime. Browsers were never meant to be run once per application. Sciter JS was supposed to be an Electron replacement but it didn't pan out. With some luck c-smile will stay motivated enough to finish it and once it gains traction there will be another attempt to opensource it.

bufferoverflow · on Nov 17, 2020

It's seems wasteful, but RAM is so cheap, who cares?

You can get 16GB for $52 on Amazon. That 700MB is equivalent to $4.64 one-time payment.

There are more important things to worry about, seems to me.

minimalist · on Nov 18, 2020

Not everyone lives in the first world. Not everyone has a new computer.

It is this sort of hubris that likely explains my feeling that personal computing has regressed in many ways for the average individual over the recent years. I'm not talking about the hacker who can run surf+i3 on their cyberdeck, I'm talking about the person with an 8-year old computer bought on sale or a 4-year old smartphone.

imtringued · on Nov 18, 2020

RAM is only cheap because people don't waste it. If every application was written with Electron or every executable was a Java program (including CLI commands) you would cry and beg for more memory efficiency.

kemiller · on Nov 17, 2020

Unless they are performance-bound, this is pretty unlikely. The productivity benefits are hard to ignore.

yjftsjthsd-h · on Nov 17, 2020

Why? Unless the browser is proportionately slower, they'll still benefit.

Symmetry · on Nov 17, 2020

It's certainly in package rather than on the same chip. The process required to make DDR is different from what you use to make the core's logic. The DRAM chips might be literally stack on top of the CPU chip, though. Apple does things that way in the iPhone IIRC.

herf · on Nov 17, 2020

Anandtech is reporting 62GB/sec memcpy, and the way-faster Geekbench results are the memory-bound ones (e.g., Gaussian blur is 4x faster!)

So I think the speedups like these are largely due to memory architecture.

mikechen233 · on Nov 17, 2020

It's not due to the on package or the united memory. M1 is using industry standard 128bits lpddr4. On package is still using regular dram chips. What seemingly is the advantage is the firestorm cpu has much wider pipeline and capable doing more load and store in-flight at the time. Also the cache is also able to provide low latency and the bandwidth numbers. Intel or amd is able to achieve similar performance in memory bound workloads if they designed the logic on cpu for that workload.

djsumdog · on Nov 17, 2020

Not to mention the e-waste and the inability for users to upgrade their own devices.

I wonder what the performance cost was of having standard memory modules. I suspect it wasn't significant and this is more of a move to prevent upgrade and increase consumption and waste.

This is another reason I really don't ever want to own another Apple device. They want more and more control over the system and they keep moving to policies that reduce the ability of regular people to repair. The performance benefit doesn't really seem worth it if I can't run any other operating systems except macOS on it.

spijdar · on Nov 17, 2020

Standard memory modules which were soldered onto the mainboard?

Given how Macbooks have had soldered memory packages for ... 8 years now, I don't think moving the memory onto the SoC was to lock out upgrade potential. It doesn't make upgrading any less possible than "completely impossible", and probably (slightly) reduces the overall cost/complexity of the board, slightly reducing the material cost/impact.

FWIW, in the future I imagine most processors will look like the M1, with additional memory available over a serial bus like OMI, used in POWER10. The "unified memory" will effectively serve as a giant cache for the CPU/GPU with slower peripheral memory used as a backing store.

gjsman-1000 · on Nov 17, 2020

Even if you could boot other OSs, you forget that there are no drivers written for them. Windows and Linux both would be unusable for potentially years after launch.

yjftsjthsd-h · on Nov 17, 2020

It would certainly help if Apple would release specs, or even just liberally-licensed XNU driver sources, but Linux has certainly gotten drivers without manufacturer help before.

stefan_ · on Nov 17, 2020

It's not magic memory, it's LPDDR4X-4266 or LPDDR5-5500. These work just fine without being in-package, so it's purely a cost cutting decision.

scottlamb · on Nov 17, 2020

That's a pretty significant step up from what they were previously shipping, right? The 16" MBP page says it has "2666MHz DDR4". Are any other major (laptop or desktop) manufacturers using LPDDR4X-4266 or LPDDR5-5500?

jeffbee · on Nov 17, 2020

I agree completely since for large projects 16GB is barely enough to run Bazel. I know that's kind of sad but often Bazel wants to load large graphs into memory. Combined with the fact that the graphics are stealing part of main memory, I don't think I'm going to be happy with 16GB. I ordered one anyway, but I don't expect to be too psyched about the reality of it when it gets here.

mixmastamyk · on Nov 17, 2020

Sounds like the build needs to be optimized.

Eric_WVGG · on Nov 17, 2020

I'm afraid I lost the source, but this morning I was reading about some fairly in-depth Xcode benchmarks, the dev was saying that there was almost no hit in performance when it hit swap, and speculated that Apple might be getting ready to move past the concept of RAM altogether in a few years. Sounds a little bonkers to me, but the bus to the SSD is no joke

mikechen233 · on Nov 17, 2020

Next gen consoles and Nvidia rtx io is kind of doing that with graphics textures. Textures are memory mapped but stored in nvme storage. When GPU reads a texture, it first looks in the ram, if not found in the ram, the controller talks to nvme controllers to load from storage.

If you think about it, gen4 pce nvme can reach 5GB/s doing 16k ramdom reads. The bandwidth is getting close to ddr2/3 territory. And new storage tech like 3dxpoint will have ram like access latency to improve small io perf.

You will always still need ram, but you can be more efficient at how to use the ram.

Eric_WVGG · on Nov 17, 2020

here's the source of that comment: https://twitter.com/panzer/status/1328708737482121217?s=21

I'm still a little unclear on whether he means "concept of RAM" in a marketing sense, discrete RAM, or a model closer to L3 cache. regardless, pure speculation

tannhaeuser · on Nov 17, 2020

Isn't fast on-die/on-package (not sure either) memory access a key factor for Apple's gigantic performance leap, though? With memory pre-soldered on small notebooks for years now, it's not much of a difference for consumers anyway once more than 16GB becomes available.

epistasis · on Nov 17, 2020

I'm not sure that the soldering does much to improve memory access.

The anand tech M1 article measures memory latency at more than 90ns [1], which is almost twice what I see for AMD and Intel benchmarks, at 50ns and 70ns [2]. If these are not comparable measurements, I'd love a correction!

It seems to me that there's significant room to improve the memory subsystem for Apple Silicon to reach parity with desktop RAM performance.

[1] https://www.anandtech.com/show/16252/mac-mini-apple-m1-teste...

[2] https://www.techcenturion.com/improve-zen-2-gaming-performan...

schultzer · on Nov 17, 2020

What people seam to forget is, that SOC is the future, and an upgrade should be as simple as switching one chip. This also comes with a lot of benefit with locality and performance.

timw4mail · on Nov 17, 2020

There's no way the SOC will be upgradable.

redisman · on Nov 17, 2020

No - but for desktop maybe that's the future? I could imagine AMD and Intel offering multiple tiers of SoCs instead of CPUs.

leptons · on Nov 17, 2020

No. That's a stupid idea for desktop. Not everyone needs 64GB of RAM, but some of us do - and that would be an expensive and awful upgrade to have to throw out your CPU with 32GB RAM on-chip just to put in the same CPU with 64GB on-chip. It makes absolutely no sense.

robotresearcher · on Nov 17, 2020

I'd tolerate that if it runs usefully faster every day. Could save money overall.

There are several parts in a workstation that are already monolithic packages. Most GPU+GRAM cards, for example.

leptons · on Nov 18, 2020

So you're talking about taking a $300 CPU chip and turning into a $500 or $800 chip depending on how much RAM is on the SOC. But the fact still remains that the vast majority of the RAM sits idle while the CPU processes through relatively small chunks of it at a time, which is what cache is for. If anything I'd rather see more cache RAM on the CPU itself than having to pay for a CPU + RAM on the same SOC. The performance gain from combining CPU + RAM on the same "SOC" is not that great enough to warrant doing that as an approach for all computers - Apple is doing this mostly because it saves them money. There's a lot of push back from power users about expandability not even being an option anymore. But 99% of Apple's customer base doesn't have any need for expandability or power-user features, or having the fastest silicon available because they only use it for facebook, facetime, or a few other basic applications. And that's fine. I won't be buying one though.

jug · on Nov 17, 2020

It's a radical decision yes, but also so very Apple-like. And I wonder how much of this wild architecture change is also behind the improved performance here.

jjtheblunt · on Nov 17, 2020

Maybe ARM binaries are smaller than x86_64?

ARM has some sophistication in its ISA, THUMB mode instructions for example, that might be in play here.

duskwuff · on Nov 18, 2020

Thumb isn't a factor. The Thumb encoding is only used for 32-bit ARM instructions; 64-bit instructions use a fixed-width 32-bit encoding.

peteradio · on Nov 17, 2020

What uses most of your ram when you need 16GB+?

swiftcoder · on Nov 17, 2020

Chrome. Slack. VSCode.

The proliferation of desktop software that are just more Chrome processes in disguise has been punishing on memory usage.

My work machine used to fully lockup on a daily basis with 16 GB of RAM. 32 GB seems to be the sweet spot right now.

masklinn · on Nov 17, 2020

> Chrome. Slack. VSCode.

VMs, especially since you need one to run docker on OSX, IntelliJ, Firefox, ...

sfblah · on Nov 17, 2020

People claim if you switch off to Safari the memory footprint goes down. Of course, if you're an FE dev, that comes with more ... challenging ... dev tools. YMMV.

ArchOversight · on Nov 17, 2020

I run Slack as a tab in Safari because it avoids the pain of running the electron app, and reduces my memory footprint.

whalesalad · on Nov 17, 2020

You can never have enough ram!

peteradio · on Nov 17, 2020

Haha! I somehow became sated with 16GB on my laptop. Now any of my workloads which require more would simultaneously want higher cpu, so those loads get pushed to a different machine entirely. So I'm curious what is the use case for 16GB+ with a laptop cpu.

kilburn · on Nov 17, 2020

For some people the main advantage of a laptop is the posssibility to move to spots where there's no or very bad connection. In those cases it becomes really useful to be able to run all your workloads on it, even if much slower.

DoofusOfDeath · on Nov 17, 2020

For some projects, linking non-stripped binaries can take huge amounts of virtual memory. At least on ELF/DWARF platforms with the Gnu linker.

Not sure if that applies here, since I assume the Mac will be using the LLVM linker, and Mach-O != ELF.

yjftsjthsd-h · on Nov 17, 2020

Virtual memory, or real memory? You're never going to run out of virtual memory on a 64-bit platform.

DoofusOfDeath · on Nov 17, 2020

Virtual memory. I mentioned it in this context because running out of physical memory means the machine starts swapping, which often results in a very undesirable slowdown.

Lendal · on Nov 17, 2020

Anything built with Electronjs.

dom96 · on Nov 17, 2020

I actually feel like 16GB is pretty much the sweet spot. I've built a desktop PC recently and didn't bother with 32GB and haven't had any troubles. The only time I've run into limits is when I was doing things like running games and big IDEs at the same time which seems like a waste in any case regardless of how much RAM is available.

nullandvoid · on Nov 17, 2020

For me on windows, multiple virtual desktops, browser (chrome for work, ff for personal) IDE's and VMs (ubuntu running in wsl, and any docker containers) made 16gn unuseable for me

And that's before I decided to boot up a game with all of the existing applications

For me the £50 is well worth not having to care about pruning applications constantly!

vmception · on Nov 17, 2020

I recently got introduced to great suspender here and Chrome with separate accounts doesn't use much resources because tabs get backgrounded.

Yes, you would still have to juggle if you really dont want to close your IDE, or lower judge how much RAM that docker container really needd, but also understand that OS’ and software allocate a huge portion of all available memory no matter what you have in the machine. Like you think you need 32gb for snappy performance and it is rational idea that all your processes absolutely need sequential memory blocks, but it isnt that true.

I honestly think you would just be smarter about your use of resources again.

With these benchmarks I am starting to lean more towards 32gb itself being the compromise. Simply because it doesnt cause you to budget resources, a luxury, but at the expense of these other benchmarks? And in the worst case we just have to wait a year or two before 32gb is offered in the M series package?

spullara · on Nov 17, 2020

I wonder if they have improved their memory compression with this new release as well. Would have compare apples to apples for various applications to see the effect.

phren0logy · on Nov 17, 2020

I would agree if it wasn't doing double-duty as memory for the GPU. Under those circumstances, even 16gb can start to get a bit tight on some tasks.

ericd · on Nov 17, 2020

Really depends on what you’re doing. If you’re working with large amounts of data, being able to put it in memory is super helpful.

andy_ppp · on Nov 17, 2020

Is the new architecture better or worse with memory usage than x86 macs?

zerkten · on Nov 17, 2020

Memory usage shouldn't change that much for a similar task. It's just that paging stuff in and out is much more performant when the system is under pressure. Right now with the least amount of M1 native software available, most of it will be allocating memory natively without garbage collection.

Apple's strategy has been to avoid generational/tracing GC which reaps big benefits in terms of memory usage. It'll be interesting to see more feedback from devs running a broader range of software. It's likely people will hit apps on the long tail that use GC, and similar schemes, which will cause them to complain about memory issues. Running old apps under Rosetta 2 will be another source of complaints.

These machines are optimized for the mass market. Although they can out perform many existing machines with 8 and 16 GB of RAM, there is the huge opportunity and demand for 32GB+.