Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the ultimate goal is to make the "acceptable entry points" so numerous and the variety of acceptable wordings so broad that you can approach the assistant with pretty much any goal you have in mind and it'll walk you through how to accomplish that.

Imagine if this was a realistic conversation with an assistant:

> "Hey Google, I'd like to order a pizza."

> "Sure, what kind?"

> "Let's see... cheese, pepperoni, sausage... and maybe some green pepers?"

> "Alright. What size?"

> "Hmm, so I need to feed 4 people..."

> "Sounds like a large?"

> "Sure, let's go with a large."

> "Alright. There's a Dominos nearby, I can order that for $8.99."

> "Sounds good."

> "Alright, I've ordered your pizza. Expected delivery in 15 minutes."

No need for the human to understand what the "entry point" is, because you can approach the assistant with pretty much _any_ entry point and it'll give you a useful response. We're still not there yet, unfortunately, and I think it'll be quite a while before we are.



You know, there is at least a dozen chatbot providers who can handle these nonstructured dialogues with multiple entry points, ESPECIALLY with pizza.

In fact, the pizza order is the No. 1 scenario looked at by the chatbot providers. In fact, it was exactly the scenario my old startup took as a case study and the first application we built with it. It could handle the different toppings, the sizes, and more. You could submit all your requests in one move, it would be parsed and sorted into its little slots.

The problem? There is only a handful of scenarios similar to the pizza. In most cases in the real world, you have to select from an external database, look at proprietary product names, and more. Another staple of chatbot demos, plane ticketing, only works well when limited to North America (in the English word). Good luck asking for a flight to Kinshasa, Kuala Lumpur, or even Wagga Wagga in Australia.

I am not even talking about the switchboards for multiple domains, like in Alexa. These ones only work with "leaky abstraction" (making the user learn magic keywords).

Another problem is really stupid. It's the availability of the datasets. The funny thing is, ye olde style semantic frameworks fare better than the machine learning ones, because there is not enough data for the machine learning chatbot frameworks, and without it, their mighty capabilities are pretty much the proverbial spherical cow in vacuum. But because the semantic paradigm is not kosher/kewl anymore, very few enterprises agree to deploy it.

None of that matters though because the users never liked typing a lot. Back in 1980s - 1990s the adventure type computer games switched from (mostly working) command line interface to point-and-click, and very few users objected.

My take is, the key is a conversational UI with strong visual feedback. For the pizza scenario above, I would draw icons of cheese and numbers, so that the user can be sure it worked.


> plane ticketing, only works well when limited to North America

And when limited to text. I once saw a demo for a voice chat for ticket orders. It took a full minute for it to tell you the multitude of options for flying 'from the Netherlands to New-York next Monday'. A human assistant can reason on what an acceptable flight may be based on a small set of parameters. A chat bot would need to know every preference in detail, like destination airport, time of day, budget, etc.


Well, on the other hand in a GUI - in pretty much a single page - I can see all options I have, all sizes, their exact price ...etc. With CMD and chatbots there's always the FOMO. "What if there are other options I haven't considered?". Humans like to be in control.


Exactly, I prefer using a GUI over calling over the phone for delivery because I might have communication or listening issues with the person on the other side of the phone.

The same thing happens for McDonald's and other places replacing cashiers with touch-screen terminals, not only I am sure of how the system understood me, I can easily navigate the several options without annoying an employee with dozens of questions, going from that to conversational UI is a step back in the wrong direction.


I have an example that illustrates this.

Just this morning I stopped at McDonald's for breakfast. I gave my order, they entered it, and it showed up correctly on the screen at the drive through. However, the order taker read back something completely different (than what they just entered). Since I saw that the order in the system was correct I ignored that and simply said "sure that's right."

This is why I prefer an interface that bypasses the human order taker. They may say one thing and do another. Even in my case, they may have then thought "well he confirmed what I just said back, and that doesn't match what the screen shows, so I'll fix it to match what the customer agreed to." Then I would have the wrong order. That can't happen if I interact directly with the order system.


English is not my primary language and I live in a country where I don't speak the local language which is not English, I have consistent issues with miscommunication.

Once I went to a McDonald's and ordered some double cheeseburger with no onions, when the order came it was just the buns with no meat.

Especially in a place like McDonald's where they serve several people familiar with the menu, the staff seems annoyed if you don't know what the ingredients of one of their menu items are and often don't handle specific requests well.

It's not just language issues, my ex-girlfriend always asked a lot of questions when ordering at restaurants, sometimes checking with me, you can visibly see the waiter getting tired or annoyed because of this, not coincidentally, there is always some minor screw up with the order because they forgot to write something down. The same happens if you go to a restaurant with 20+ people and the waiter comes in to take everyone's order, the chances of making a mistake goes up.


I know! Reading your food order verbally from a paper-printed menu to a human who needs to memorize or write it down and walk back to deliver it to the food prep area, seems to be such an error-prone and ambiguous way to order a product. Doubly silly that often the waiter will simply take the order and enter it right into some kind of computer or kiosk. Just give me the damn computer! It's shocking that the vast majority or restaurants still do it this way.


Wow I'm really shocked by all the responses. I think the majority of the world - ie outside of our tech bubble - would much rather deal with a human than enter their order in on a computer. Particularly in a restaurant. In fact I'd go further and say that a good proportion of non-technical people would not only prefer to deal with humans, but would trust another human more than themselves with a computer interface.

Personally I'm techy and I still prefer the human aspect. In fact part of the appeal of going to a restaurant is to be waited upon - otherwise I might as well just order a takeaway online. Sure they might occasionally screw up your order but this doesn't happen nearly as often as this thread would suggest. I do eat out a lot and I honestly don't think I've had my order messed up in the last 2 years. I wouldn't say I eat at particularly posh places either though I do actively avoid most fast food establishments (not a snobby thing, I just don't like the taste of McDonalds et al) so maybe the issue of reliability is more subject to the lowest paid positions in the food service?

In any case, even if you did have your way and entered your orders directly into a computer, you'd still have to deal with the fallibility of humans with the chef cooking your food, waiter / delivery driver distributing your food, and anyone else who exists along the chain. In fact I wouldn't be at all surprised if many of the mishaps described in this thread were actually failings of those individuals rather than the order takers whom you assumed had messed the order up.


I guess it depends on what type of restaurant you're going to. If you're going to a cheap, fast service restaurant then sure, I'd like to have a tablet I could use to order from.

But if I'm at a good restaurant, a server is actually part of the service. They can give you recommendations on dishes, and wine matches, and help you get exactly what service you want. Also the human experience is just part of the fine dining experience.

It would be really strange if I want to Eleven Madison Park or Bouchon and they just handed me an iPad to order my food from.


> They can give you recommendations on dishes, and wine matches, and help you get exactly what service you want.

That just sounds like a very poor database.


It's actually a very rich database.

It's not just simple pairings, but includes nuances that would take many different fields to capture: Dish X can be made vegan, but tastes better if you then order with extra seasoning.


Discovery can be bad too. If you don't eat meat or pork or whatever, you often have to go through the menu in O(n) and look for your options. And then maybe it's a dish they've stopped making months ago, but printing new menus would have been to expensive. (OK, that won't happen at McDonald's.)

With a tablet, you could filter the list with a single tap. I've thought about building such an app a few years ago because I'd love to use it, but I have no idea how you could sell it to restaurants. It seems most places are too conservative and cash-strapped to tie themselves to proprietary tech.


They're tied to propriatary tech glasses to boots. The ability to hire a tech person is so far outside restaurant spend that they outsource to a single company, which handles all their software, their registers, and provides support. Your best bet is to sell it as a code module to the PoS integrator.


Red Robin lets order (and pay) from a kiosk at the table. The only problem with it is you're eating at Red Robin.


I'm a native English speaker and once received a Big Mac with no meat from a McDonalds in America.


Be careful what you wish for. The Burger King at the NYS Thruway rest stop just north of NYC has one. Think McDonalds has them on Long Island, but I haven’t used one.

It’s awful — probably an order of magnitude more awful than dealing with a non-English speaking cashier. As with anything digital, it’s a sales funnel designed to trick you into ordering whatever their profit driver is. Have fun not buying a value meal.


I remember when Google rolled out a hands free payment system at McDonalds and the identity verification step involved speaking your initials to the McDonalds employee. Despite so much automation (not even needing to pull out your phone) the system totally failed on the verbal interpretation of the initials: "Did you say "J" or "G"??" and gesturing the shape of the letter with fingers.


Relatedly, I've found it a little unfortunate that, whenever I'm asked to spell out something to a human, chances are the receiver has either never heard of the NATO Phoenetic Alphabet or doesn't understand it.


I think most people find navigating conversation with another person much easier than navigating a GUI or computer system of any kind.


You might be right for people who have ordered a dominos pizza many times and already know exactly what they want. Rattling off your requests to a person might be faster and less stressful than navigating a UI.

But what if you don't know what your options are and how much they cost? You have to ask the person to list off the possible pizza styles, sizes, toppings, and the prices for each one. Then they have to tell you about all the available side dishes and desserts (with prices, again) and how there's a half-off deal if you get THIS side with THAT pizza on a Tuesday, and on and on and on. It'd probably take a good 20 minutes to convey all that over the phone (I hope you have a good memory or are taking notes), and by the time they're done, the poor employee is probably so frustrated that they're ready to strangle you.

Or, I can suck up all that information at a glance on dominos.com, and I won't have to repeat my credit card number over the phone 5 times before they get it right.


An innovation was made in that space — a menu.

Restaurant websites always, without exception, suck. Dominos, while the pizza is garbage, has a wonderful ordering website. But even then the actual menu is awful.

They are always a sales funnel first, menu second. They cannot give accurate ETA, ever. It’s harder to display multiple choices well on a screen vs a sheet of paper.

Unless you are a place with 5 menu items, the paper menu is superior in almost every scenario.


Well, restaurants (simple or fancy) provide an excellent analog counterexample: there's basically always a menu that you look at and select items from - the definition of a graphical user interface.


Conversing with a person tends to be more intuitive than using a machine, but also more ambiguous and sometimes more difficult to get clear information. If I already have all the information I need, or if I want an opinion instead of objective fact, then going through a person can be better.

Otherwise, they're just acting as a voice interface for a GUI that I'm perfectly able to navigate on my own, and I'm not a fan of voice interfaces when dealing with technical systems.


People prefer human interaction, but human agents are prone to miscommunication, distraction, and fatigue.


> Humans like to be in control.

I think this is an assumption that might not hold true, outside of the tech sphere.

Us? Of course we want imperative interfaces! E.g. "Order me a pizza using JSON with my pizza_now script with options -g and -b and promo code CASEOFTHEMONDAYS"

People who are not technically inclined? End-state declarative. I see most as fine saying "Order me a pizza with pepperoni" and having it show up at their door.


Absolute. I wish engineers could use better examples to explain capabilities.

Ordering a pizza as shown in the above example is very contrived, no one needs that, a GUI is much better to execute this use case. But the power of chatbots will light up if it can answer 'Would this pizza be too spicy?", "can you deliver this after 4pm?". What I mean is when the chatbots can take over more of customer queries which otherwise might be directed to the store via phone call. Or something which requires deep knowledge of the product and when not every corner case can be put on the GUI menu.


Well, the command line/voice interface is really useful once you know what you want, it is less useful or even a deterrent if you are in an exploratory mode. I've had several occasions where I just want my regular pizza ordered without having to click a thousand buttons and sit through an IVR. 10 kids are coming home for a party - they'll all eat cheese pizzas with no toppings.

"Ok Google, get me 10 Margherita pizzas by this evening" is super convenient.


I'd prefer a chatbot that just serves up different GUIs based on my initial query. There would be a pizza GUI, an Uber GUI, a map GUI, a chat GUI, a YouTube GUI and so on.

Wait...


> Well, on the other hand in a GUI - in pretty much a single page - I can see all options I have, all sizes, their exact price ...etc

A while back, one of the big features Google advertised was the ability to start a transaction by voice, and then continue it on a device with a screen.


>Humans like to be in control.

I'll pay but I prefer when someone else orders a pizza.


Obvious problem is you didn’t order pizza, you ordered Dominos, and your selection is limited to who ever signs a deal with Google or pays them off for top billing.

Correcting that means now you’re arguing with Google/Alexa/Siri, which is an infuriating experience.


And now we reach a whole new level of "you'll never get [a chatbot] to understand something when [the company that produced the chatbot] getting paid depends on them not understanding it."


Ideally it would have some kind of anthropomorphized graphical avatar applicable to the context. Research out of Stanford[1] as far back as the '90s has suggested such interfaces as a means for improving human-computer interaction. If I was writing a letter, for example, perhaps an animated document fastener would be appropriate. In this case, why not an animated, anthropomorphic pizza that morphs into the Domino's logo as a paid-for branding.

[1] https://web.stanford.edu/group/cslipublications/cslipublicat...


The sad part is that it might still be faster to do this online even if the conversational intelligence was so good. If I was rich and had a real human assistant, I might say "order us pizza" and they could figure out most of the details on their own. However, given that I don't, it is just quicker for me to order a pizza from my phone filling in the details rather than talking about them.


I would never do something like that. If I order a pizza either I order it from some place that I know that can do a decent pizza or I’ll thoroughly look at the reviews of all the places that can deliver it to me. Seriously...Dominos...


That's ok, you just open with "Hey Google, I want a pizza from Good Pizza House" instead. Anything you leave open is up to the assistant to try and interpret what might be a good choice.

For the average consumer who otherwise didn't specify, Dominos is a good choice: it's cheap, reliable, they understand the menu and deals, and it's consistent across time and place.


If that's the small pizza shop around the corner they'll probably not have a contract with Google.


Dominos is by no means the best pizza I have ever had, but they have gotten much better in recent years. For a fast pizza, it isn't bad


I'm sure this is technically possible, but it may never be practical. Much like human operators, you're going to want to drive the user down a certain path, because order is often extremely important.

What if Dominoes doesn't have green peppers? What if they don't like the pizza brand? What if they're asking because they have a coupon? You may end up repeating sections of the conversation multiple times, and in the end the user will just end up so confused they give up.


More importantly, how do you know Dominoes (or someone else with an incentive) is not driving you down a certain path? This happens today already with Ads, Fb feed, instagram etc... but there is something distinctly suspicious about taking away specific control points in the decision making process.


What would happen if you were speaking to a real person? Imagine the most useful real person on the other end. That's the goal of these things.


What would be the adwords for this? Scanning search results and picking the result is what enables the ecosystem currently.


Pretty clearly payment for order flow. Notice the dominoes suggestion.


Agreed that Domino's Pizza would pay for this referral, the key difference though is that search engines generate more potential bidding on keywords because they render multiple results.


This isn't that far away. If Dominos had an external API you could build this right now.

The only difference is that you would want a confirmation step with the VUI - i.e. "Alright, you want a large cheese, pepperoni, sausage, & pepper pizza from Dominos, which will take 15 minutes to deliver. Place the order?"


They have a website. https://github.com/RIAEvangelist/node-dominos-pizza-api has a node wrapper for the website.


You can't even (and probably don't want) do that with a real human. Why do we think it'll one day become a possibility/desirable with a computer?


Yeah I don't understand, I can literally call up a pizzeria and do this exact song and dance today whenever I want. I prefer ordering online without talking to anyone. I fail to see the benefit for consumers.

I do see the benefit for employers.


"I don't think waitresses have anything to worry about."

https://www.youtube.com/watch?v=NXsUetUzXlg


It still looks pretty much like zork:

> take lamp




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: