Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Cloud Platform is the first cloud provider to offer Intel Skylake (googleblog.com)
306 points by rey12rey on Feb 24, 2017 | hide | past | favorite | 206 comments


Just for reference, since you don't choose your processor explicitly on GCP, but instead choose your zone with homogenous processors, here is their current processor/zone layout: https://cloud.google.com/compute/docs/regions-zones/regions-...

Since some GCP engineers are watching: Presumably we'll see some new zones to provide these processors, or will it be a limited release within existing zones? And if so, will you be moving away from homogenous zones in the future?


Good question! We've made it possible for this whitelisted (Alpha) to let you make sure you're creating Skylakes. We'll hopefully have more details to share at NEXT in a few weeks (or immediately if you sign up!).

Disclosure: I work on Google Cloud.


Where's the form to apply for Alpha? :)



Bothers me a bit with "32 cores" listed when they are 16 core/32 Thread machines.


FWIW, Azure disables hyperthreading on their servers. 1 core = 1 real core.


I try to consistently say "vCPUs", but yeah...


Given the marketing toward HPC tasks, shouldn't hyper-threading be disabled?


No? We make sure that pairs of vCPUs (hypertwins) stay together. If you'd like, you can offline your threads in the guest for a fairly similar outcome. Some of us regret that we sell an n1-standard-1 and the shared core machine types, but it's pretty nice to have a $5/month VM price point for those that want it. Most folks doing serious numerical computing end up using much larger VMs, and always crossing NUMA nodes (which is a much larger impact).

Disclosure: I work on Google Cloud (and historically focused solely on Compute Engine).


I'm curious, what is there to regret about having a shared core machine type? Some people have a use for them and appreciate the cost savings.


Disabling hyper-threading only makes sense if you're getting messed up by the OS's scheduler.


I've been benchmarking these against Haswell and Broadwells. Despite being 300 MHz slower, we're getting between 5 and 45% faster benchmarks on linear algebra functions that we run a lot, even without doing much work to tailor to AVX512 instructions yet.

The cache is also a whopping 56 MB.


Hi Zach! Are you using MKL at all under the hood (either through numpy, scipy, etc.) or is that just from AVX2 running much better clock-for-clock on Skylake?

Disclosure: I work on Google Cloud.


Not using Google Cloud but Intel just released MKL 2017 Update 2 today that automatically dispatches AVX512 code on supported Xeon CPUs.

Doing a quick comparison between our Haswell servers and Skylake servers we do see a nice little speed bump of ~10-20% on matrix heavy code. The bump is negligible for most other things.


No MKL, just a mix of AVX2 and a bit of AVX512 written with intrinsics in C++. Working on benchmarking Eigen at the moment, which is also AVX512-optimized.


Still limited to a measly 208gb of RAM? That was pretty good back in the day, but now my desktop workstation has 128gb. AWS offers up to 488gb on their latest R4 instances, and 2tb on their X1 instances released a year ago.


From the November post [1]:

> You'll be able to use the new processor with Compute Engine’s standard, highmem, highcpu and custom machine types. We also plan to continue to introduce bigger and better VM instance types that offer more vCPUs and RAM for compute- and memory-intensive workloads.

[1] https://cloudplatform.googleblog.com/2016/11/power-up-your-G...


I use 128 gb too but here we call it SSD


damn. out of curiosity, what do u use that much ram for?


Database


I honestly can't think of any viable purpose for that much RAM unless there's a huge amount of waste happening


How hard are you thinking? I can think of numerous scenarios where an entire working set being in RAM is useful and occasionally essential, and it is quite common as an optimization in several industries. Finance comes to mind (kdb+ is designed to weaponize tons of RAM), as does high-scale Web operations. This is part of why solid state NVMe devices are compelling, too, to help bridge the gap, but it sounds like not for your purposes.

Even on the less complicated end, some years ago throwing a shitload of page cache at MongoDB was the only way to maintain high write loads, and eventually one reached a point where one had to keep an entire shard in cache. That bottom end threshold is lower than you think. I don't know if that's changed.


I could be wrong but it really sounds like you're talking about production. If that's the case then yes, I can respect the RAM requirement and it's almost too obvious to state that it isn't purposeless.

The key here is it's a personal development workstation on which he has these resources. If that's a requirement, I would argue there's something fundamentally wrong with his development processes requiring that much data locally without even a server.


We use multiple machines with 6 TB each at my job, all for use with databases.


No - I believe these will be up to 64core & 416GB


Nope. Bigger :).


Interesting, are you guys decoupling the 1.6GB/RAM per core requirement?

I'd love to spin up a 32x512GB for a DB Server.


It's been awesome to see our Skylakes rolling in over the past several weeks. I personally have been waiting nearly 10 years for AVX-512 ever since playing with LRBni.

Disclosure: I work on Google Cloud (and helped a bit in our Skylake work).


Please relay back to your team that the moment you get Postgres for your Cloud SQL product you'll get a lot of AWS converts.


Please come to NEXT in two weeks! (https://cloudnext.withgoogle.com). Most of the registration becomes credits :).


Sounds like an upcoming Postgres announcement at NEXT :)


Out of curiosity, why would you prefer Google Cloud to AWS?


Spending on GCP is more predictable than on AWS. Also the general behavior is more predictable. For instance all pre-emptible instances are terminated in 24 hours. On AWS based on when they need the resources.

I believe that whatever GCP does, does better. Faster network, faster spinning and turning off servers, simpler quotas, etc. GCP doesn't offer that many services as AWS and most of those are not very compatible with Apache tools (although they're doing their best, e.g. BigTable looks like Hbase from outside).

In our case, we're running ~28,000 servers on GCE and utilizing many services, especially PubSub, BigQuery, etc. The price we're paying for our setup is roughly 1/4 of a similar setup on AWS.


> For instance all pre-emptible instances are terminated in 24 hours. On AWS based on when they need the resources.

Uhh, Google will terminate preemptible instances whenever they need the resources as well. Hence the name, "preemptible."

https://cloud.google.com/compute/docs/instances/preemptible


True. What I meant is that the servers are going to be terminated in 24 hours, even if they don't need the resources atm. AWS doesn't terminate them up until they don't need them.

Also I personally find much easier to understand is the pricing. It's very consistent and simple. Preemptible servers are discounted by 80% from the regular ones. On AWS the price changes, also they only offer older generations while GCE offers any type at the same discount.


Holy cow, 28 000 servers, can you give us a clue to what you need so many servers for?


We're processing videos, a lot of them. Roughly 34PB worth of data every month.

Additionally we're running our search engine at top of it with couple of other services.

You can check what we're up to here https://pex.com


That landing page is a whopping 9MB [1]. Please optimize it for users with slow internet like me. [1] https://www.webpagetest.org/result/170225_RN_1K80/


I apologize. We need to pay more attention to the actual website


Cool, thanks for the info!


It's cheaper for many workloads, the products work together a lot more seamlessly, instance start times are faster, and the orchestration UI is way, way, way more intuitive and user-friendly (to this user, at least). There are definitely more knobs to turn in AWS, but Google is almost at feature parity with them, and most workloads won't see a difference between the services in terms of features.


Wow, that's persuasive... is the only reason people are choosing AWS the sheer number of products they provide? (eg; Postgres RDS & Aurora)


FWIW, that's not the only good stuff on GCE: * Instances have sustained use discounts - you don't need to pay in advance and reserve them and the discounts are applied very, very fairly. * Pre-emptable instances are cheap as all hell, and they are perfect for batchwork, celery workers, etc (anything disposable). You don't have to bid for them so automation is easier. * Up to 2 Gbits/second/core, depending on your workload type * CUSTOMIZABLE instances. * live migrations.

The only thing I really miss from AWS is RDS's postgres.

I don't even want to talk about Azure.

Why do people use AWS? 1. Free tier. 2. Plethora of PaaS as you mentioned. 3. AWS was there first, has brand recognition and is a safe choice for management.


Heh, I know you don't want to talk about Azure, but perhaps others can share -- what's so bad about it?

It's superior for running Windows at least, right?


> "Heh, I know you don't want to talk about Azure, but perhaps others can share -- what's so bad about it?" This is what I will say:

I'm working on a migration to it, so I'm not very experienced with it, but so far, it's been very painful compared to GCE or AWS, in which I've run production stacks. I'd rather not comment further, simply due to my relative new-ness to the service and the chance that it's just lack of experience. The customer service is, at best, run at a glacial pace.

> "It's superior for running Windows at least, right?"

I'm a Linux guy running a platform agnostic Linux stack, but I'd assume so. I get the feeling so far that it's really good if you want to run MS-SQL and .net, and garbage otherwise.

The only reason we're migrating from GCE to Azure is because our GCE credits are expiring, and Microsoft gave us the YC Credits offer for Azure. We'd rather stay on GCE if we could. Also, postgres RDS/CloudSQL is the one thing we miss from AWS


Actually, Azure takes a very strong platform agnostic approach. Over a third of all VMs on Azure are running Linux https://fossbytes.com/33-microsoft-azure-vms-now-run-linux-o..., (supported distributions include Ubuntu, Redhat, Centos, SUSE, Debian, and CoreOS), the Azure Container Service supports 3 open-source orchestrators (Swarm, Kubernetes, DC/OS), and the Azure WebApp PaaS service has Linux support in preview.

Disclosure: I work at Microsoft; opinions are mine.


That's fair and you won't get any arguments from me there. It's very easy to run singleton Linux machines on Azure and I have no complaints there.

I wasn't going to lay it out here, but, as I have your ear:

I'm not saying it's impossible to run a Linux stack on Azure - but man, trying to image the machine, for example, is a whole rigamarole.

Want to run your own image on a scale set? Oh, well, you need to craft a JSON template, by hand. There also appear to be limits on how many machines can run off an image.

ARM is a a mess (IMHO), and it's impossible to select a custom image when creating a new resource group. It also seems (correct me if I'm wrong) impossible to change the vnet of a VM/ARM after it's created. it also seems like ARMs can't share an existing vnet. Again, please, correct me if I'm wrong. I'm new to this service.

I may have to drop to running a bootstrap script to get my stuff working, but the idea of doing a curl | sh is pretty horrific to me, from a security perspective.

Non MS-SQL as a service? Nope.

The new managed disks are very nice. I like those a lot :)

Also, Azure times out my ssh sessions :(


For you ssh sessions, have you tried to use ssh keepalive ? (ex: ServerAliveInterval option on an openssh client)


I've added it to my config yesterday, I hope it helps :) Thanks for the tip!


Your GCE startup credits [1] or something else? Have you been in contact with our PMs about Pgsql?

[1] https://cloud.google.com/developers/startups/


Yep, our GCE startup credits. We've only burned through about half of em (from the billing page, it looks like ~45k), and they expire in late march. So if we had an extra year to use 'em, that would be amazing.

When I last chatted with our Account Manager I mentioned Postgres, yeah. We're currently running our own on GCE, but it would be awesome to have it aaS, with replicas and automated backups that I don't have to keep an eye on all the time :).


I don't want to fork the thread to discuss other providers (take a look at GitLab's experience on Azure if you'd like), but I will say that we support Windows Server, SQL Server, .NET, and Powershell throughout GCP [1].

Disclosure: I work on Google Cloud (so I want to sell you our services).

[1] https://cloud.google.com/windows/



I know security is not compliance, and I'm not here to compare the two, but Google goes to great lengths to protect security. [0]

TL;DR:

- Google's global SDN is ultra-secure, and Google carries your packets on its network rather than dumping them onto public web. Google's undersea cables are shark bite-proof! [1]

- Google Data Centers are mostly homogeneous and rely on Google-build hardware rather than vendors. This greatly helps with securing infra - only one vendor (Google) to trust, only one set of best practices to follow, lower risk of misconfiguration, and exposure risk is minimal.

- > 600 security engineers.

- Encryption-at-rest and in-transit is ubiquitous.

[0] https://cloud.google.com/security/whitepaper

[1] https://www.youtube.com/watch?v=XMxkRh7sx84

(work on Google Cloud, not in marketing :) )


I can't find "ultra-secure" in any of our security certification docs. Is that better or worse than "super-duper secure"?

For security, I don't care whether my traffic as routed over a private network or public web as long as the data is encrypted, because a "private network" is only as private as thousands of miles of fiber and every single network facility it traverses can be. Which means 'not very'.

Does Google build it's own hard drives/SSD's and CPU's? If not, then that's at least 2 other vendors to trust since both CPU's and hard drives are related to hardware security.


Ha.ha. Did I mention I'm not in marketing? I do appreciate your passion!

On your first point. There is a very strong risk of inadvertent misconfiguration or your employees not following best practices.. Or even poor documentation. Google cloud by default gives you a global secure vpc that never traverses public internet, so there's less risk because the baseline of security is high. Sure, you can run VPN tunnels between data centers. Point is, with Google cloud you don't need to.

On the latter point, of course you cannot eradicate every vendor, especially within the obvious context of this specific Intel announcement... But, again, not having three dozen flavors of configs and four network router vendors does make a difference. I would encourage you to read the paper I linked to above (discussed this topic in great detail), and also perhaps [0].

Also security folks at Google, who are much more qualified to discuss this topic than I am, frequently post on hn. [1]

[0] https://cloudplatform.googleblog.com/2016/02/Google-seeks-ne...

[1]https://news.ycombinator.com/item?id=13388941


AWS did a lot of work to improve how they share compliance information over the past year. It wasn't much better than Google's reference before that. Azure beats both of them in my opinion, as you can access many of the raw compliance reports without an NDA.

It's also important to pay attention to which services you consume, as not all of them are certified to the same degree.


As far as HIPAA compliance goes, though, AWS EC2 requires dedicated tenancy, whereas GCE allows HIPAA workloads on any VM, including preemptible VMs.


This is true, however Amazon's standard BAA is a bit more robust than Google's. This didn't stop my HIPAA-compliant company from choosing Google, but it does make it more difficult to compare the two clouds on HIPAA-friendliness.


Can you say more about what you found less robust? And (hopefully) were you unable to get the legal agreement up to snuff by the end?

Disclosure: I work on Google Cloud (but not compliance).


There's been a lot of talk in the compliance gcp slack on this, seems like a lot of products are being audited and announcements will be made at Next.


Let's hope they are building the managed Postgres product from the ground up with an eye toward compliance!


Well, since mysql compatible Cloud SQL is, I would imagine pg would be too.


AWS launched Postgres RDS in November 2013. They covered MySQL and Oracle RDS under BAA in August 2015. But AWS didn't add Postgres RDS to the BAA until November 2016. We're hoping GCP doesn't delay covering Postgres Cloud SQL under BAA like AWS did with their managed Postgres.


In organizations I've worked for, we use AWS because it's politically correct to do so; non-technical leadership has heard how "Cloud" is an alternative to "IT" and have golf buddies that buy "Cloud" from Amazon.

Technical management can sometimes be persuaded, given overwhelming evidence, that competing products/services are superior or more cost-effective but ultimately conclude "we'll never sell this solution to leadership".


Managed Kubernetes.


Agreed. I'm on GCP whenever I get to pick.

For anyone else who is on AWS (maybe because of rds postgres, or because it's client work that needs to be on AWS), you can go outside of vanilla AWS to get a similarly great dev UX. I've used Convox + the Weave ECS AMI on a project, and it was pleasant.


What's the difference between using Kubernetes on GCP and running Kubernetes on a node on AWS EC2? By "managed" do you just mean I don't need to manage the lead node?

Just getting into Kubernetes :)


That's correct. The master nodes are managed by GCP and upgrades to the kubernetes system are trivial.


How much of a boon is that, though? Isn't Kubernetes built to be self-managing on any platform?


Kubernetes manages containers, but what manages Kubernetes? GKE just makes the experience very painless, updates, etc.


It's better than fully self-managing it, but GKE is still less managed than I'd personally like. I just want to submit containers as jobs to be executed in "the cloud", with the cloud provider figuring out the details of where they get executed. On GKE, have to create a cluster in the cloud, then submit your jobs to your cluster, and set up logic to manage provisioning/resizing/etc., rather than just being able to submit your jobs to a giant abstracted cluster managed by Google. Joyent's Triton container service basically does that, so it's possible.


Have you tried Appengine flexible environment?


GCP is generally cheaper than AWS.


Killer network.


I wonder why there isn't more interest in simt style approaches among cpu manufacturers, instead they are going for longer vectors with scatter/gather/masking (avx-512 and sve by arm/fujitsu).


There are a few threads asking about various features (SGX, TSX, etc.) so I want to make a top-level comment: we're not ready to share more today (sorry).

Disclosure: I work on Google Cloud.


It's okay you're not ready to share more today, but do you know/have an idea of when you will be?


Why is it important to offer Intel Skylake on cloud platforms? Is there some specific processor extensions present in Skylake that make them particularly compelling in a cloud environment for a particular industry or a particular set of needs?


Skylake is an amazing upgrade, more so than even the Haswell bump. Beyond just the clock-for-clock kind of improvements, AVX512 is a huge win for many vectorized workloads (see my other comment about waiting nearly 10 years for it...). Linear algebra (or ML if you prefer), rendering, and even string operations (operating on 512-bits per instruction is a win over 256) are all a lot faster.


The other fundamental change is that most instructions now have masked variants, so you can apply an operation to only specific elements in a vector based on a mask register. This avoids branch mispredictions, which are a big performance hit in some workloads.


Yes, I started syrah (https://github.com/boulos/syrah) in 2009 precisely because I loved the masking of LRBni and wanted to "use it" even from SSE, AVX and (later) AVX2. Once I played with NEON though, I realized I had screwed up the format for it.

Sadly, (because KNL doesn't have nearly the same volume) until Skylake became a real thing, I had no reason to update this. I'm planning on dusting it off now!


Not asking you to reveal any internal plans, but do you think there's any chance that some form of KNL could show up on GCP in the near future?

FWIW we're currently using GCP, generally love it, and I'm looking forward to trying out Skylake...


what do I have to consider as a software developer to benefit from these upgrade? Do I need special tools/libraries?


Simple process improvements that make Skylake faster will just be "there" -- some code will just be faster with no changes!

For the numerical, AVX stuff -- it should be mostly automatic for you, if you're already using optimized libraries at the core -- MLK, BLAS, that kind of stuff. They'll be transparently upgraded for you -- ideally -- to take care of these things. They normally check what your CPU is at runtime, and pick the fastest implementation among a few different choices it has.

You will need toolchains to support this all, but for the most part that likely won't be a burden unless you want to get your hands dirty and start it yourself -- inevitably, this should all mostly be "pre-canned". Your optimized linear algebra, vector, and math libraries are what will mostly concern themselves with this, not you necessarily. In fact, several of the things already available can probably use these new extensions! I bet if you're using Intel MLK for example, it will probably "magically" get faster on these Skylake machines by using AVX512 automatically.

If you want to understand more: you can always go grab an SSE/AVX reference, check your /proc/cpuinfo, and write a few simple things on your own to get a feel. Your toolchain will definitely support it :)


> I bet if you're using Intel MLK for example, it will probably "magically" get faster on these Skylake machines by using AVX512 automatically.

(Aside, I think you mean MKL.)

There is a 200MHz clock reduction when running AVX512 instructions. If your code makes heavy use of AVX512 there is of course still a big net win, but I'm curious of the impact with more heterogeneous workloads. We have an app that is a mixture of scalar and vector code. Some, but not all, of the vector code would benefit from 512 bit vectors. But how much does the clock slowdown when running this code bleed over into running the other non-AVX512 code? I guess I'm asking how quickly it clocks down, and how quickly the full clock speed is restored. Worst case it seems you could be running full time at a 200MHz slowdown due to blocks AVX512 instructions scattered throughout the application. Is that a valid concern?


Thanks for the correction. And FWIW this is an insanely good question and it's hard for me to immediately answer! The things I want AVX512 for are very fat SIMD registers for my cryptographic code, and I've only lightly kept up with it since AVX-512 was pushed off to Skylake-Xeon only. So I haven't worried about highly heterogenous workloads (in my head).

I'd have to look up the specifics; but does AVX512 simply slow the clock, or does it actually have some kind of limited number of hardware ports? I wonder if some clock slowdown would be very much of an issue, since clock-for-clock, you should see better performance on Skylake anyway.

Just curious, what kind of workloads do you think you're looking at here?


I'm having trouble finding good references now, but I am sure I remember reading that is was simply slowing the clock.

In my case, yes, Skylake would still be a win over older hardware, but the question is whether to use AVX512 or not. The workload is a real time animation system with a bunch of nodes in a graph that get evaluated in sequence. Some nodes would benefit from AVX512, but others would not. So the question is, if we vectorize those nodes that would benefit and get a speedup there, will the other unvectorized nodes now run slower as a result of the lower clock speed, canceling out the benefit.

It sounds like your case is a much better fit for AVX512. Out of curiosity, have you tried running on Xeon Phi, which also supports AVX512?


For the "clock-for-clock" improvements: nothing. For AVX512, various numerical libraries will be adding support for you over time. However, just as with Haswell and earlier generations, even AVX2 will run better on Skylake.


MPX (hardware bounds checking)

SGX (secure enclaves)

TSX (Hardware Transactional Memory)

SGX can store SSL keys in a hardware protected enclave that can't be accessed by hypervisors/AMT so that can be useful for security sensitive stuff.

Cloudflare probably could have used MPX to prevent their recent leak with minimal performance overhead.

TSX is cool.


Unfortunately MPX sounds cool but is a lot of fluff IMO. It is absolutely not "minimal overhead" in its current implementation and it is very invasive in some ways, and breaks many programs. (It also doesn't even provide temporal protection, and is slower than ASAN in those modes?) There's probably a reason almost nobody uses it from what I see; it sounds pretty half-baked...

Yes, it might have stopped the CloudFlare case, if they were willing to pay (I believe) for L4 over-read protection overheads (2x I believe, with a LOT of variance between GCC and Intel's compiler), and give up multithreading[1] in their application, and probably deal with other false positives and unsupported things.

SGX is completely neutered in Skylake and totally useless without getting your enclave signed by Intel, unless Google has struck a deal or something. You can mostly ignore it. It's good for preparation of your application against future processors where you'll have control over this, though, I guess...

TSX would be nice, yes. It's deeply annoying it's taken them so long to get right and that they've stratified that feature amongst CPUs -- is there any _real_ reason my Kaby Lake XPS13 can't support TSX? I doubt it other than "market segmentation makes us more money". I guess now I'm just ranting, though.

[1] https://intel-mpx.github.io/overview/


Wasn't TSX completely disabled in all Intel processors because of silicon bugs? If it is enabled in the Skylake E5 Xeon series that Google is rolling out, does it mean the Skylake E5 Xeons are the first processors to have a working TSX implementation?


AIUI that was just certain flavors of Broadwell and possibly all (or maybe just all-released-at-the-time) Haswell chips, so later Broadwell steppings had a working TSX implementation.

(As an example, my Xeon D-1540, a Broadwell-family chip, advertises the TSX bits in CPU feature flags, so it's not errata'd off there.)


I don't think Google provides SGX on their cloud instances (SGX is extremely limited right now, since any code your placing in an enclave must be signed by Intel).


But I'd be interested to know if there are any plans to do so. Google Cloud people reading this - are you there?


See my new top-level comment (https://news.ycombinator.com/item?id=13726871).


Yes it's in the article.


Your wording on this seems unnecessarily harsh: the article mentions AVX-512 but doesn't go into much detail about that or mention anything else. Some people may immediately know that their code will benefit from AVX-512 but that's only relevant to a subset of workloads.


Amazon's c5 (Skylake) series shouldn't be far behind...

https://aws.amazon.com/about-aws/whats-new/2016/11/coming-so...


Any sign of IPv6 support on the horizon? Slightly embarrassing in 2017...


Honestly Amazon EC2 just got VPC's with IPv6. My ISP still doesn't have native IPv6 either.


I wonder if Skylake would offer a material improvement for our workload. We don't necessarily use AVX-512, but we do use a heck of a lot of CPU resources on the current architecture. We are a python/elixir shop.

Great job GCP team!


Did Intel actually enable the TSX extensions in Skylake? If I'm not mistaken, they shipped it in the last couple of generations but disabled it after release. (Something like that?)

It's something that I've wanted to play with for sometime. It's cool that GCE has them available as a service.


mobile chips have them disabled I believe


It's available on my laptop at least:

  $ egrep '^(model name|microcode)' /proc/cpuinfo | head -n2
  model name	: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz
  microcode	: 0x9e

  $ egrep -o ' (hle|rtm) ' /proc/cpuinfo | head -n2
   hle
   rtm
This is a mobile chip and an old stepping (and running the microcode that supposedly addresses the issue, presumably by disabling it on 'bad' chips?), so I'd be surprised if newer chips (especially server ones) didn't have it.


Hmm TIL. I think newer ones had an errata (or many) and Intel decided it wasn't worth fixing. Disappointing, but meh.. it is probably true that not many people use Tmem on laptops.


Constructive criticism:

Your calculator page is unusable on mobile due to fancy "material" form filling.

https://cloud.google.com/products/calculator/


You are too polite. I've reported this before, and I'd easily add an expletive in front of unusable. Worse, it's not even Viewable on mobile.


Also, the bandwidth calculator seems to be broken, I can't add bandwidth charges for services through the calculator.


I'd rather wait for Ryzen. You won't know which Skylake processor you're getting - the gimped one or the non-gimped one. AMD tends to keep their features consistent across the line.


What do you mean by "gimped?" As in the consumer chips without AVX512 vs. the server chips with AVX512?


Yea, meanwhile AMD tends to keep the features across both desktop and server lines of CPU the same.

And the pre-release Skylake server procs seems to be gimped as it's missing a few features versus the actual official release Skylake server procs.


Depends on the workload. A compute bound vectorizable problem could be four times faster on Skylake.


Why 4 and not 2 (512/256)?


the masks on avx 512 can lead to more than a 2x improvement sometimes


Zen doesn't even have full width AVX2 (256bit) units the comparison is quite invalid.


And Ryzen isn't AMD's server product.


Naples doesn't have it either, it's the same CCX.


Does Google provide machines with AMD processors?


No, we don't provide any machines with AMD processors in Cloud, today. Fun historical fact though: the 'N' in n1 meant 'Intel'.

Disclosure: I work on Google Cloud.


Interesting to know about the 'N'. How about g1 & f1? Any history behind those?


I came up with the naming scheme here. We picked n for Intel as I could be confused with i3/i5/i7.

Early work was done in AMD and we had an a1 series pre launch.

IIRC, g is for Google or Generic where there are no promises on architecture. And f was for fractional.

I've been out of Google for 2+ years now so reality may have drifted from that original scheme.


Any plans on doing that in the future?


During the Ryzen launch announcement GCP was listed on their partners, this may only be due to the availability of AMD GPU's but considering this was a CPU-specific talk I'd guess the magic 8-ball says "outlook good".


Based on public talks by GCP engineers, it's very unlikely. Maybe SEV (encrypted VMs) might change that. However, the small glimpse into GCP that you get from looking at their work on KVM would suggest no.


Whenever I want to try GCP, during signup I get stuck at "Account type Business" and the need to enter a VAT number.

It hints there's Individual Accounts, but I see no way how to set it to that?


The issue is VAT in each country in the EU (which I assume you're based in). We don't (currently) collect VAT, but as businesses are required to represent that they'll pay it in their home countries, we can allow you to use it as a business. It pains me (and others) deeply that we "block" individuals like this, but that's how it is today.

Disclosure: I work on Google Cloud.


That is a weird restriction given that I can buy from the Play Store, etc. But thanks for your response.

Back to AWS it is.


For what it's worth: which country are you in?


I think if you are trying to create a trial account while being logged in into your personal Google account, it should give you that option.

disclaimer: personal opinion. I work at Google but not related to this area.


This is what I tried (logged in via private browsing window), but no luck.


Can you send me the screenshot of the business account type part of the screen to my email sergey.sambros | gmail.com?


I would be quite nice if they actually advertised the particular type of CPU core you rent, rather than some abstract unit of computation. Or at least some kind of performance baseline.


We do! As mentioned up thread, each zone (today) has a consistent CPU platform:

https://cloud.google.com/compute/docs/regions-zones/regions-...

For example, in us-west1-a, you're getting a 2.2 GHz base clock E5 Broadwell.

Disclosure: I work on Google Cloud.


Brilliant!


Side question: are these extensions available in the desktop (i7) parts? Wanting to test out some optimisations for some code I have.


Nope, only on Xeons.


No mention of SGX; a major Skylake feature for cloud computing. Is it enabled? Is it accessible?


They use KVM, and considering that not even host support is available on mainline Linux, I doubt so.

It would also be mentioned in the article if it were.


Hopefully it will be the the first to offer the much cheaper AMD Ryzen/Naples, too.


Any idea on Naples timeframe for retail? I'm hoping to build a small workstation in a few months and 32 cores sounds like it might let me browse hacker news faster.


Not sure about Naples, but Ryzen is only being released on 8/16 c/t, and other lesser variants will be released in about 3~4 months. AMD said in January that Naples will release H1 2017. The Naples release cadence will be similar?


The 32c/64t part is the only one AMD is talking about, so I'm guessing it will be like the Ryzen Client launch - release the big chips first and then work your way down. All signs point to Naples being released in Q2, hopefully not in the way that "Q1" for Ryzen Client meaning near the end of the quarter :/


Q2, or Q3 at the latest I think.


How much confidence do you have that Rev. A Ryzen parts will even work?


That's a bit too extreme, don't you think? Disliking AMD is one thing, but claiming that the entire initial run of an AMD processor might be faulty is just silly.

My guess is that you have no idea how much effort goes into verification and testing of something as complex as a microprocessor. A significant chunk of NRE costs goes into verification and test AFAIK.


Around 80-85% is the projected cost of Verification and Validation (pre/post - Silicon). This is what historical data shows for us (as a CPU design firm doing ARMV8)


In fairness, there's precedent.

Though that was the FPUs in Intel processors, not AMD. So it's not very good precedent.


Intel's FDIV bug was an outlier. Besides, nowadays most of the ISA is implemented in microcode[1]. There are two advantages to this approach: 1) it is much easier to verify the microcode unit (it's simpler/smaller), and 2) it allows CPU vendors to "fix" ISA implementation issues post-release by issuing microcode updates.

[1]: ISA => microcode is equivalent in some respects to C => LLVM IR


Outlier? I would say uncommon but not an outlier. Here's a non-exhaustive list. http://wiki.osdev.org/CPU_Bugs

There was also an issue with transaction memory with both Haswel & Broadwell. The fix was to disable TSX support via micro code update. I wouldn't call disabling a fix personally. I doubt Intel even compensated the folks who bought it for the TSX support.


Yeah, that list is far from exhaustive. Years and years ago, Google's websearch cluster validation suite found a tricky bug that isn't on that list. The exchange with the CPU manufacturer was amusing.


Given how many Intel/AMD/etc. CPU releases there have been, I would still refer to such glaring implementation bugs as outliers.


Most of the ISA may be implemented in microcode, but the parts that matter are certainly not microcoded!

For example the TLB access and the fast path of the TLB miss are not microcoded. You only get to microcode if you have to set an accessed bit or a dirty bit, or if there is a page fault.


Not really. AMD's Phenom was a flop and the initial batch had a horrible TLB bug. Their FX processors were also hugely hyped and hugely disappointing.


OP said "will [they] even work?". That's a very different thing from a TLB bug. Although I concede that depending on the nature of the TLB bug, it may lead to a significant performance hit.

Regarding your criticisms of AMD's past processors, are you coming at this from a consumer standpoint, or are you just disappointed with their architecture and/or implementation?

I ask because the Phenom II seems quite loved by its users[1], and the FX series was also quite good for its price. I had a FX-8320 in my last PC and it was quite a capable CPU in my opinion, and many others seem to agree[2].

[1]: https://www.newegg.com/Product/Product.aspx?Item=N82E1681910...

[2]: https://www.newegg.com/Product/Product.aspx?Item=N82E1681911...


For datacenter operators there is no difference between "performance is so crippled that the TCO doesn't pencil out" and "does not work." No difference at all.


My guess is you haven't been in this business long enough to comment. Rev. A AMD "Barcelona" parts did not work in any meaningful way. They all had to be thrown out.

http://www.tomshardware.com/forum/246780-28-issues-stop-ship...


How much confidence do you have that Rev. A Skylake parts fully work? http://semiaccurate.com/2016/12/27/intel-cut-skylake-epearly... (apologies in advance for the tease)


>In our own internal tests, it improved application performance by up to 30%.

this post would have been interesting if they had included those tests.


This is from our own internal workloads, including things related to Search. If you run say specint_rate2006 (as Intel did for Haswell to Broadwell [1]), you'll see even higher results. We just wanted to call out that even for our own internal benchmarks, these things are seriously great (and we don't make much use of vectorization!).

Disclosure: I work on Google Cloud.

[1] http://www.intel.com/content/www/us/en/benchmarks/server/xeo...


Would a typical load balancer/web server running NGINX doing SSL termination see an improvement switching to Skylake?


+1 for SGX support. Would be really cool to remotely attest the software you are running in a real-world cloud.


Are these E5-skylakes?


"Yes", most systems are Google, including those in Cloud, are dual socket. [Edit: To be clear, we're not sharing the SKU, but this is a real server-class Xeon].

Disclosure: I work on Google Cloud.


Are E5 Skylakes available exclusively to Google? Can't find E5 v5 on Intel's site: http://ark.intel.com/#@Processors.


Purley is the big Skylake based server platform chipset reset for Xeon E5 and higher so Google must have early access to it.

Skylake Xeon E3 is on the current older server chipsets..

Purley is supposed to be a significant improvement.


We announced with Intel back in November that we would be getting early access to Skylake [1]. Today's announcement is that we're ready to let more people in the door.

[1] https://cloudplatform.googleblog.com/2016/11/power-up-your-G...


Thanks!


E3 is not a server SKU, it's meant for workstations and it's more similar to i7 than E5. It lacks various features including ECC RAM which, I suppose, is a no-starter for cloud hosting. Also some virtualization capabilities are missing, notably accelerated interrupt injection (APICv+posted interrupts).


E3 does support ECC RAM. I'm running a Haswell Xeon E3-1225 with 16GB ECC RAM as my workstation.


I believe that the ability to use ECC RAM is actually the main differentiator between i7 and E3.


I've seen E3's in low end servers like the Dell T20...


I've only seen E3s on Intel's web-site. But maybe E5s are coming? That'd be awesome.


Oh, so now I can't voice my opinion until I've been "in the business" for x amount of years? Yeah, nice try.

Another commenter already brought that issue up, but thanks for pointing it out again. I still think that it's quite silly to claim that Ryzen Rev. A may end up being a paperweight based on a mistake that took place a decade ago. Whatever floats your boat, I guess.

And from what I read, it seems like it was an extreme edge case, so the TLB error was triggered only during specific workloads. Sucks to be AMD back then.


Please don't engage in flamewars on HN.

We detached this subthread from https://news.ycombinator.com/item?id=13725460 and marked it off-topic.


All I'm saying is it would be better if you knew what you were talking about. AMD has been outsourcing verification to its customers for many years.


> AMD has been outsourcing verification to its customers for many years

You have no idea what you're talking about if you think for a second that a large CPU vendor like AMD would delegate verification to "its customers". It's like saying that Boeing just builds planes and tells airlines to make sure they fly correctly before allowing passengers to board them.


> You have no idea what you're talking about

This is uncivil and not ok on HN. Please take time to edit this kind of swipe out of your comments here.


Seriously? So when I see someone on HN who I think is clearly wrong, I can't say that? I agree that my tone could be improved, but I only said it that way in response to the parent's tone.

Also, how was our argument a "flame war"? It only lasted for 3-4 replies, and was quite civil in my opinion.

I tend to always agree with your decisions, but this one is a bit too extreme.


It's not about "tone" but content and it's quite simple, though always harder to see in one's own case. The HN guidelines say this:

When disagreeing, please reply to the argument instead of calling names. E.g. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Similarly, "You have no idea what you're talking about if you think for a second that a foo like bar would baz" can be shortened to, "A foo like bar wouldn't baz."

HN's rules apply regardless of how other commenters are behaving. If they didn't, we might as well have no rules, because it always feels like others are behaving worse.


Damn, those are some specific guidelines haha! I will make sure to dial down my responses in the future. Thanks.


[flagged]


This violates the HN guidelines egregiously. If you do it again we will ban you. The rules don't change when someone else doesn't know what they're talking about.

We detached this subthread from https://news.ycombinator.com/item?id=13725976 and marked it off-topic.


> I have worked both for AMD and for AMD's one-time largest customer. I know exactly what I am talking about.

Man, if you had said this along with your previous comment, we wouldn't be having this argument. Remember, this is the internet, where everyone is a self-made expert. I guess that I've grown accustomed to the "guilty until proven innocent" approach.

So you are saying that AMD doesn't verify its designs in-house? Is your information recent, or has it been some time since you worked there? Because I looked up verification jobs after I read your other comment, and it seems like they are actively hiring testing and verification engineers.

> You, on the other hand, are talking out of your butthole.

Nah, I'm typing on my keyboard.


You, on the other hand, are talking out of your butthole.

No matter how ignorant someone else may be, it's never okay on HN to use uncivil language like this. Your comment stands just fine without it.


A Google day is not responsive


I think it's funny that big Xeon upgrades seem to always occur approximately 8 months after big Power architecture blog posts.


Hahaha. Meanwhile I've been running 2 machines with Skylake and a combined 24 cores/48 threads, 256GB DDR4, 6x512G SSDs, unmetered 1Gbit/s public and 2,5Gbit/s internal for over half a year for a combined $150 total, in complete privacy and in full control of my hosts.. go dedicated, people.


The consumer Skylakes are not even in the same league as what we're announcing today. The easiest analogy is comparing a laptop to a beefy two-socket server.

Disclosure: I work on Google Cloud (and of course want you to pay us to use Skylake)


> want you to pay us to use Skylake

I already use a nice E3 Skylake workstation at work, thank you very much. Could be an E5 or a big i7 with more cores (I just rechecked on ark and I guess it's still not available publicly for Skylake, but I also guess this can not held for very long...), but we want it to be close to our product target, which is an E3, so E3 it is.

For tons of practical purposes, lots of reasonable E5 are comparable to what you can get with the biggest i7 -- and during the last gen for tons of practical purposes gen to next gen comparable CPU have only progressed slightly. Of course there are workloads where you want to use the most insane CPU, or some CPU with new features / better perf in some niche workload, and so over, but when you start to have such advanced needs I doubt a little that OTS platform solutions are better for the majority of people with highly advanced needs...

But yeah, on the mass of people, some will remain interested by "your" solution. Meaning mainly Intel solution, given how you advertise Skylake soo much...


Well, we're talking Xeon E5. I frankly don't care if your stuff is 20% faster because I would pay at least 5 times the amount I do now on GCE and the hosts are not under my control. I could get 8 more cores for the price of one of your cores.


Who's your vendor? I'd like to get the same deal you're getting.


online.net. I got my machines off a promotion so you can't get the same setup probably, but they're very cheap anyhow and run other promotions all the time.


As another has said, who is your vendor?

I'm sure they would like some more customers :)


I'm guessing, perhaps unsurprisingly, that we won't find out.


I'm guessing, not surprisingly, that you felt really good typing this comment.


How I felt about the comment?


online.net. They run promotions all the time from which I got my machines from, but they're insanely cheap as is. Uptime has been 100% for the past few months.


That's pretty impressive hardware for under $150


Per month, just to clarify.


renting used servers? any pointers would be helpful - $150/yr sounds cool

How much technical (DevOps-y) do you have to be keep it secure & available?


Almost none. Put CoreOS on both, pull yourself a HA database and install Rancher which creates a ipsec network overlay (both hosts), set up your containers on both hosts, set up HAProxy on both hosts (configuration is done only once), then set up an external DNS service from the library that updates your DNS when a host goes down. I even run a small DNS server in 2 containers on both hosts so I don't need the external stuff, that updates automatically once hosts go down and also does some GeoIP resolution. Not complicated any more.


Using DNS for HA is a lose-lose scenario. I'll take a proper load balancer backed with a cloud provider's superior network any day.


Not if you use a low TTL. And no, I'd rather not have my traffic run or even terminate through every intelligence agency on the planet, even if that means accepting a few drops in the timeframe it takes to propagate.


... hence the lose-lose scenario. Having a low TTL just makes it your client's problem. If you want to be paranoid about the government, you can still use an L4 load balancer such as the one offered by GCP.

Run your infrastructure however you want. This setup would never fly for a real organization.


Just to clarify, 150/month.


Vendor? Sounds too good to be true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: