Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Anonymous Search Engine (tuxdex.com)
90 points by TomZero on Aug 15, 2022 | hide | past | favorite | 53 comments


Cool to see a new general purpose search engine with its own crawler! I tried a couple of queries and it worked fine for me. Searching for for inetdex (https://www.tuxdex.com/?q=inetdex) returns 1,442 documents that apparently just reflect the crawler's user agent though. Might need some tuning. Can you (assuming OP is affiliated with the site) say something about the team and business model, if any? Would also be interesting to hear about what infrastructure you're running on and if there's any plans to open source the crawler.

I really like the minimal/90s look and feel too--keep up the good work, assuming it's not a honeypot! :^)


Yup, a plus from me about the clean 90's look, hope it will stay that way.

Searched for a term, gives +400k results, up to page 80 and still going. Another big plus from me since I just love to search low ranked results, sometimes you find true gems in obscure pages.


it's returns already 2,816 documents. no, that is not a honeypot!


I searched "how to center a div" and seem to find a bug.

It looks like the html tags from one of the search results messed up the page


I can't see any error for the search query "how to center a div", maybe it has been fixed. https://tuxdex.com (Anonymous Search Engine)


It's so anonymous, it's not even filtering html tags.


Of course, html tags are filtered, it is possible that this function is not yet perfect.


Pretty impressive.

The search results come very fast, but there's some low hanging fruit QOL optimizations as far as I can see, mostly to do with search result ranking.

It doesn't seem to weigh the terms properly, there is some pull toward relevant terms but not as much as I'd expect from BM-25. I also think it doesn't promote results with adjacent terms enough. As a result the relevance of the search results sort of undulate up and down as you go through the results.

The de-duplication also seems non-existent. You can easily get a page full of duplicates of the same two links. You come a pretty long way if you just keep a simple hash of the HTML of each website or whatever.

This is made worse by what I think is a bug where the crawler follows redirects, and reports them as the original URL and not the redirected URL. I actually had the same bug in my crawler when I started out. Nasty anti-synergy with Wikipedia which has a lot of redirect.

These are relatively minor issues that shouldn't be hard to fix.


What's QOL?


quality of life


Thank you for the many posts and for visiting the TUXDEX.COM.

TUXDEX has received an update of both the software and the indexes, thanks bacon-and-stars we have been able to fix the error, TUXDEX is safe from XSS and other attacks.

About the results: We are working on our algorithm and other methods that will improve the results. Please keep in mind that the well-known search engines sometimes employ thousands of people and have been on the market for years.

http://tuxdex.com (Anonymous Search Engine)


Results for the query: how+does+tuxdeo+make+money Instead of: how does tuxdex make money Result page: 1 0 Documents found in 0.27 seconds


This. I'm glad these guys are running their own crawler, but their story doesn't add up.

Most likely this is an outfit that does a lot of webscraping and wants to minimize the amount of manual banning they incur. So they set up a plausible indie search engine as a front. Of course the search engine has to actually work when the scrapees notice and try it out.


TBH, the comments here are way more positive than I expected.

And the search engine is actually pretty decent (from my test).


Is this open source, and if not how do they prove anonymity?


Even if it was open source, how would they prove anonymity?


I dont remember exactly, but it's possible to create a completely anon search engine. User encrypts the query using their key and sends it to search engine, search engine now doesn't know the query, but then it encrypts its entire database using public key of user and see if hashes meet for keywords?... something like that


I think you might be referring to homomorphic encryption. If I recall, the issues with it are the operations are expensive and it has a large overhead.


Also - how are you going to block bots without being anti-user also without some sort of tracking?

You could easily get 100 not requests for every 1 user request.


Our search server receives the request from the web server and returns the results. The web server, which could store the user IP, does not know the search query, since the POST method is used for transmission. The search server that processes the request does not know the user IP, only the web server.

The logs will be deleted after the visitor numbers have been determined. Your search query remains anonymous at https://tuxdex.com.


Easy: just don't block bots.

Problem solved.


Zero Tracking for news: https://yup.is


The search queries and the results are encrypted and the query is not visible in the address bar.


You can append queries with `q`

https://www.tuxdex.com/?q=query


You could also try https://private.sh which anonymizes your connection by using a two party system [1].

[1] https://private.sh/how-it-works.html


"Maintained by Private Internet Access".

PIA is owned by Kape Technologies, https://en.wikipedia.org/wiki/Teddy_Sagi#Kape_Technologies_p...


Hangs when I search for this:

    https://www.tuxdex.com/?q=*.*


Why do you want to search for *.* why not ;*)


sorry for the joke you can e.g. do a search for domain:*.com or domain:*.* the index will be built up in the first step half a billion documents. we have already indexed more than 15 million domains, some with more than 10,000 pages from the domain. one million pdf documents are already available in the index and the good thing is that you can access all the results, unlike most search engines that only show you a few hundred results.


"the query is not visible in the address bar"

What a weird thing to put as a feature.


It keeps your queries out of your browser history.


if the query isn't in the url then it isn't in the access log either


Interesting fact you obtain when you create a GP search engine - getting a reasonable results for “gmail” and “google news” is not trivial )


Unfortunately, doesn't seem to be open source.

For me personally, the mainstream independent search engine is Brave search, its genuinely good.


I've been well served of late by SearX, specifically instances running the new-ish SearXNG (fork?).

SearX is a search aggregator, rather than having its own crawler, and as such a number of its instances getting temp suspensions for running too many requests in too short a time on the various search engines it queries. I still get good enough results from it.


I use searXNG daily too!

But in terms of a real search engine, I find Brave to be the best.

If didn't use Searx, I would probably be using Brave.

the searx instance I use uses brave and google by default.


I've just started using SerchXNG. How often (if at all) do they add smaller search engines like this one as options in your experience?

I was hoping to be able to use some esoteric ones, but that doesn't seem possible, unless I've overlooked something.

Being able to aggregate a couple of these up-and-coming engines with result reduplication sounds really neat.


I honestly don't pay that much attention, but they do have a decent list of potential search sources, so it's not something they're lazy about.

Additionally, this new search site (tuxdex.com) appears to just be a private/anonymous version of their existing search site (websearchengine.net) - although websearchengine.net isn't listed as a source for SearXNG either.


Is it better than DDG? My experience with it for the time being (last 2 years ish) has been mostly having to fall back to Google multiple times a day.


DDG offers a Tor onion address.

That's a feature I like, great for privacy and anonymity, relatively easy to implement, and affords some resistance to bots (who will move on to lower hanging fruit due to the connection set-up overhead)

Hope this new search engine will also offer an onion address.


TUXDEX will also provide an onion address in the near future.


IMO, Yes, the results are much better, and its Actually independent.

the Goggles and Discussions features are great.


> Your search queries and the results are encrypted

Does this just mean that it uses SSL/TLS?

> the query is not visible in the address bar.

Why would this be a privacy gain?


WRT the query not in the address bar - this is probably because then something like tuxdex.com/?terms="embarrassing search" doesn't show up in history or suggestions.


also to handle the case of browsers sharing the referrer url, though firefox only shares the domain and noreferrer html and security tags exist as well


Similar pages are not filtered[0]. Some queries return very weird results[1]. Also there is no way to add pages to be indexed. Overall 1000 times better than my private search engine, keep up the good work!

[0] https://www.tuxdex.com/?q=git

[1] https://www.tuxdex.com/?q=whale%20trades


Doesn't do spelling correction.

Not as good as google at figuring out which terms are important.

    lucene explain api
weights API heavily, and doesn't return reasonable answers. For google, it's the first hit.


> Doesn't do spelling correction.

For me this feels like a feature. Or just suggest. I don't want this to seacrh "corrected" keywords by default like Google does.


The index is not full yet, our crawler is constantly working. Today we indexed about 5000 documents from the https://apache.org domain, unfortunately this document is currently not in the index. But I can promise you that we will include at least 100,000 documents from apache.org.


You might want to fix the xss issue...


thank you, fixed


the search results are horrible. I wrote a word with 4 letters and it was showing me results for only every letter individually...


I'm sorry you were disappointed, what word is it? we're going to have a look.

https://tuxdex.com (Anonymous Search Engine)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: