We're building a search engine to compete with DuckDuckGo. No JS, no WASM, no spying. Just a statically generated results page.

UnHidden@lemmy.world · 7 months ago

We're building a search engine to compete with DuckDuckGo. No JS, no WASM, no spying. Just a statically generated results page.

Pantherina@feddit.de · 7 months ago

Wow this is great!

if you are using your own index, I think you could use a more economical approach to fight the spam bullshit of the modern web.

instead of using badness enumeration, crawling everything and filtering malware, use an opt-in principle
have a community method of gathering new trusted websites
use websites internal search functions to get more results
use categories to split up the websites, reinventing what people should find: general, news, navigation, science, politics, IT, technology (not code), art, music, philosohy, …
have an app or submission website where users can submit new websites, and some form of community control over it (kinda censorship but in a good way)

This could fix the web as it currently is, by rethinking what should be found, pushed etc. Rating websites by quality could also be helpful.

Also if you support payments in crypto or cash, there should be no problem to make it paid.

Onno (VK6FLAB)@lemmy.radio · 7 months ago

I love the notion. The marketing “better than DDG” is a little janky. Perhaps consider a positive statement, like “finally find what you’re looking for”.

This is a crowded landscape. I’ve been here since Gopher and seen plenty of services come and go. With that in mind, here are some questions you might want to consider:

How does it compare with products like SearXNG, specifically their ecosystem of plug-in search types?

How do you plan to pay for it?

How do you expect to protect the index against spam?

How will you scale it to a global audience?

How will you handle language?

Good luck!

UnHidden@lemmy.world · 7 months ago

To answer your questions in order:

We have our own index, its not a shitshow of mixed results like Searx tends to be. this also means that we’re not chasing breaking changes of some larger engine when they decide they dont want us, like Twitter did to Nitter, and Bing did to Searx.
We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.
We’ve already got spam prevention and removal measures in place, but I won’t discuss them.
We don’t know how to scale it since its centralized by design and the frontend and backend are tightly integrated, largely because the frontend is largely generated on the fly by the backend. Maybe host a copy for each region we’re aiming to acquire users from?
Our engine already understands 5 languages, and we hope to expand to CJK languages soon.

WetBeardHairs@lemmy.ml · 7 months ago

You could let people host their own as a method of scaling. But that limits it to geeks like us.

Use kubernetes and let it scale and pay for hosting on cdns.

hydroptic@sopuli.xyz · edit-2 7 months ago

We don’t know how to monetize. Ads are the only option that we know of, donations do not work at all, as proven by my previous projects.

A subscription-based model might be the only viable one, since ads will inevitably lead to a conflict of interest and voluntary donations are mostly a no-go. The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

Personally I pay for Kagi which has been decent enough

Amerikan Pharaoh@lemmygrad.ml · edit-2 7 months ago

I mean, a search engine is literally the last thing on the internet I’d pay a subscription for. In a world where literally everything else nickels-and-dimes us for subscription service, search engines, torrent trackers, game modders who paywall their mods, and other kitschy non-essentials are literally the first things to get shuffled off the monthly budget.

If we weren’t in such a deep recession that I pay as much a week for my gas as I do my groceries, with rent and ACTUAL bills eating the majority of what’s left, I’d feel a bit differently; but if wishes were horses, we’d all ride. I literally had to start growing my own green rather than buying it, the economy’s so shit.

Orbituary@lemmy.world · edit-2 7 months ago

Personally I pay for Kagi which has been decent enough

Whats “decent enough” mean? I’ve been curious and you’re the only person I’ve known who pays for it.

pacmondo@sh.itjust.works · 7 months ago

I pay for it, the results are quality and the fact that my brain doesnt have to sift through ad results and can just look at the real data is so nice. Additionally, they have a large number of “lenses” which can change the scope of your search. For example, they have a lens for searching lemmy as well as lenses for the “small web”, which filters out all the results from massive corporate websites and gives way more personal project sites and the like.

All in all I’m a fan.

ParetoOptimalDev@lemmy.today · 7 months ago

I never thought id pay for Kagi and that paying for a search engine was ridiculous. Then I kept seeing loudly positive feedback from reputable people in my circle and tried the trial.

I pay for it and never have the “I only ever use !g on duckduckgo” problem.

Sorting by web pages with least ad trackers is a cheat code to find old style websites with people sharing knowledge for knowledge’s sake rather than profit.

KoboldCoterie@pawb.social · 7 months ago

The problem is that people are so used to the notion that everything is “free” that many are convinced that online services should always be free and balk at the idea of paying for anything.

A huge part of that is that most people don’t consider privacy concerns to be a cost. All they factor into their evaluation is whether it costs them actual money.

My Password Is 1234@lemmy.world · 7 months ago

I got so excited reading this post, but as I read that the project will not be open source, my excitement immediately faded away

CameronDev@programming.dev · 7 months ago

Pages are statically generated

Can you elaborate on that? To me, statically generated would mean you are pre-rendering a html page for every possible search, which doesnt sound possible? Do you mean that its all server side generated (at the time of search)?

blujan@sopuli.xyz · 7 months ago

I think he means pages are presented as static html+css pages, generated dinamically on the back end

Lemongrab@lemmy.one · 7 months ago

Closed source and privacy most of the time don’t mix. Or more so the privacy crowd and closed source doesn’t mix. You won’t see much support for your project if it remains like that. Maybe a source available but still closed license would be better. Think about your monetization strategy a bit as well. Consider having premium features and make it a freemium product.

Rob@lemmy.world · 7 months ago

How does it compare with Kagi?

Lunya \ she/it@iusearchlinux.fyi · 7 months ago

Anyone else kinda scetched out by the title for Veritasium saying “Veratasium”, or the title for dark.fail saying “dark.fail: is a .onon site online?” instead of “dark.fail: Which Tor sites are online?”?

ExtremeDullard@lemmy.sdf.org · edit-2 7 months ago

I applaud your efforts and I admire your idealism.

Unfortunately, the minute you get the bill from your internet provider, you’ll need to find a way to pay for it, and your good intentions will instantly dissolve in the murky realities of modern corporate surveillance capitalism.

But at least while you haven’t gotten your first bill, it’s refreshing to watch your enthusiasm.

pixelscript@lemmy.ml · 7 months ago

My thoughts exactly when reading this.

I believe people when they claim to develop free software. Often because it’s software the dev wants for themselves anyway and they’ve merely elected to share it rather than sell it. The only major cost is time to develop, which is “paid” for by the creation of the product itself.

You (OP) are proposing a service. Services have ongoing fees to run and maintain, and the value they create goes to your users, not you. These are by definition cost centers. You will need a stable source of funding to run this. That does not in any way mix with “free”. Not unless you’re some gajillionaire who pivoted to philanthropy after a life of robber baroning, or you’re relying on a fickle stream of donations and grants.

You indicate in other comments you will not open the source of your backend because you don’t want it scooped from you and stealing your future revenue. That’s fine, but what revenue? I thought this was free? What’s your business model?

It sounds like what you want to do here is have a free tier anyone can use, supported by a paid tier that offers extended features. That’s fine, I guess. But if you want to “compete with DuckDuckGo”, you are going to need to generate enough revenue to support the volume of freeloaders that DDG does. If your paid tier base doesn’t cover the bill, you will need to start finding new and exciting ways to passively monetize those non-revenue-generating users. That usually means one or more of taking features away and putting them behind the paywall to drive more subscriptions, increasingly invasive ads on the platform, or data-harvesting dark patterns.

Essentially what I’m saying here is, as-proposed, the eventual failure and/or enshittification of your service seems inevitable. Which makes it no better than DDG long term.

It is, at any rate, a very intriguing project.

sugar_in_your_tea@sh.itjust.works · 7 months ago

pay for it

I wonder what a distributed search engine would look like. Basically, the index would be sharded across user computers, and queries would hit some representative sample of that index. This means:

hosting costs are very low - just need a way to proxy requests to the network
search times should improve as more people use the service
no risk of the service logging anything - individual nodes don’t need to know who requested the data, just who to send the response to

My biggest concern is how to build the index, but if OP is willing to share that, I might start hacking on a distributed version.

grue@lemmy.world · 7 months ago

Don’t start new; contribute to what already exists: https://en.wikipedia.org/wiki/YaCy

Waraugh@lemmy.dbzer0.com · 7 months ago

This is really neat and I’m just hearing about it after over twenty years of development. I need to try it out, thank you. How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

grue@lemmy.world · 7 months ago

How do you stay in the know about this kind of stuff?

By being terminally online, I guess?

More concretely, I’ve spent (probably too much) time on Slashdot, Reddit and now Lemmy over the years (subscribed to Free Software and privacy-related communities in particular). Also, looking through sites like https://awesome-selfhosted.net/ and https://www.privacytools.io/, wiki-walking through articles about Free Software projects on Wikipedia, browsing the Debian repositories, etc.

I’m sure there are plenty of things I haven’t heard of either, though.

ElectroVagrant@lemmy.world · edit-2 7 months ago

How do you stay in the know about this kind of stuff? I’m curious about all the cool stuff out there I wouldn’t even know I’m curious to find.

I was going to mention YaCy as well if nobody else was, so I can chip in to this somewhat. My method is to keep wondering and researching. In this case it was a matter of being interested in alternative search engines and different applications of peer to peer/decentralized technologies that led me to finding this.

So from this you might go: take something you’re even passingly interested in, try to find more information about it, and follow whatever tangential trails it leads to. With rare exceptions, there are good chances someone out there on the internet will also have had some interest in whatever it is, asked about it, and written about it.

Also be willing to make throwaway accounts to get into the walled gardens for whatever info might be buried away there and, if you think others may be interested, share it outside of those spaces.

sugar_in_your_tea@sh.itjust.works · 7 months ago

Awesome! That’s pretty much exactly what I’m looking for, though I’m interested to see how easy it is limit certain peers to certain functions. Not everyone has resources to crawl and index pages, but a lot of people can store the index.

I’m interested in having client-side web storage, so you can participate in the network by just having the search page open (opt-in of course).

I’m honestly not actively working on it, but if OP provides the database and/or crawler, I’ll do some research on feasibility.

octopus_ink@lemmy.ml · 7 months ago

I wonder what a distributed search engine would look like.

Isn’t that what Searx is/can be?

https://en.wikipedia.org/wiki/Searx#Instances

I admit it’s not something I’ve looked closely at.

grue@lemmy.world · 7 months ago

No, Searx is a metasearch engine that queries and aggregates results from multiple normal search engines (Google, Bing, etc.)

A distributed search engine would be more like YaCy, which does its own crawling and stores the index as a distributed hash table shared across all instances.

sqw@lemmy.sdf.org · 7 months ago

i feel that decentralized search is an extremely valuable thing to start thinking about. but the devil is in practically every one of the details.

sugar_in_your_tea@sh.itjust.works · 7 months ago

Yup. Even if you trust all your peers (which isn’t reasonable), there’s still a ton of practical issues that need to be resolved:

pagination with a different set of peers
moderation of CSAM and whatnot
outdated peers and stale data
how much data and where are results reduced

It’s a really complex problem without getting p2p involved, and p2p just adds a ton of other problems.

So I’m probably going to stick with building my Reddit clone, which I think is simpler (search doesn’t need to happen at the start).

UnHidden@lemmy.world · 7 months ago

For now we’re going to host on residential connections, and if any ISPs ban us, we’ll just find other ISPs

fishos@lemmy.world · 7 months ago

Yeah, when you say stuff like this, it shows how woefully unprepared you are for the realities of this. You can’t scale, can’t self host for long, don’t see a way to pay for this… When I can already pay Kagi for a fully working, excellent service, why would I choose you? This is guaranteed to crash and burn the moment your ISP tells you you can’t run a commercial grade server through your residential connection. They’ll either cap your bandwidth to unusable levels or disconnect you entirely. If you’re lucky you’ll have 1 or 2 other options to choose from, whom will blacklist you shortly after. Then, after you’re burnt through all the “easy” ways to host, all you’ll be left with is professional grade services that you admit you can’t afford.

Also, you make zero mention of user privacy. So what happens when you get your first subpoena? Or before that, why should I trust you with my data in general? What policies do you have in place to ensure my legal rights are protected? Do you even know what the legal rights are per state/country and how the location of where someone connects from impacts you? How are you gonna handle visitors from the EU with GDPR?

Nifty idea, but way too much “I’m gonna single handedly reinvent the wheel” vibes.

octopus_ink@lemmy.ml · 7 months ago

Would make Richard Stallman smile :)

If this is a closed source project, that statement doesn’t work even as a joke.

UnHidden@lemmy.world · 7 months ago

That comment is there specifically to drive engagement up with all of the people correcting me in the comments.

rar@discuss.online · 7 months ago

Ah, the 4chan method of engagement, right?

Railcar8095@lemm.ee · 7 months ago

Lying is a great way to get engagement in the post, and then see your project crash and burn.

I’m only interested in your rant in a few weeks when nobody cares.

Possibly linux@lemmy.zip · 7 months ago

Richard Stallman cares more about what is running on your computer than he does about what is running on a server.

Fair point though

NoLifeGaming@lemmy.world · 7 months ago

Sounds interesting! I saw some other guy post about how you guys wouldn’t pick pro ukrainian content over pro russian and I think that’s the right thing to do. I always found it “interesting” that youtube will always promote the legacy media (in my eyes akin to propaganda) whenever you search for news or current events. Look forward to seeing where this goes and i hope you have an open policy about decisions in the search engine about what you promote vs demote. Who knows what else the other engines are promoting when people search to skew their views.

ProdigalFrog@slrpnk.net · 7 months ago

Ahh, you’re the guys who posted over in reddit before your thread got locked that think it’s a good idea to promote Russian propaganda equally with Ukrainian content, because you don’t want to ‘Take sides’ politically. Closed source too, so that’s pretty much a dealbreaker right there, especially for Privacy focused users. We’ve been abused by closed source software for far too long to trust anything less.

You also have absolutely no plan on how to monetize, as others have said in this thread already.

I certainly won’t be supporting you, not with those values.

PrincessLeiasCat@sh.itjust.works · 7 months ago

Thank you for taking the time to point this out.

Chadus_Maximus@lemm.ee · edit-2 7 months ago

Sounds great! Where do I fill in my sensitive data?

Lunya \ she/it@iusearchlinux.fyi · 7 months ago

Would make Richard Stallman smile :)

source (code)?

Lemongrab@lemmy.one · 7 months ago

Closed source

Lunya \ she/it@iusearchlinux.fyi · 7 months ago

octopus_ink@lemmy.ml · 7 months ago

Closed source

Yeah, not sure how they can included that line about Stallman with a straight face. That’s almost libel.

Robert7301201@slrpnk.net · edit-2 7 months ago

https://lemmy.world/comment/8535938

They just said that to “drive engagement”.

Lemongrab@lemmy.one · 7 months ago

Agreed

wischi@programming.dev · 7 months ago

“Only two crates used”. What’s great about reinventing the wheel? A closed source project with big claims trying to reinvent everything from scratch. Nice project 🤣

Mubelotix@jlai.lu · 7 months ago

Every dependency is a security hole

Sotuanduso@lemm.ee · 7 months ago

I don’t know DuckDuckGo, but what’s the purpose of trying to compete with it? This is not a rhetorical question. Is there something wrong with DuckDuckGo, something you feel you can do better, or are you just making a competitor for the principle?

space@lemmy.dbzer0.com · 7 months ago

Not OP, but there is value in having competition. DDG is just a bing front-end. The big search engines have a major problem with the quality of results going down, as the internet is SEOd to death. The companies behind these engines don’t seem to be very eager to fix it, they are just hoping to replace them with AI. We’ve also seen how these engines have been turned into ad platforms, which changes the incentives… Instead of ranking quality, they are ranking who pays more.

Taking a different approach to ranking results that isn’t ad driven, that can punish AI generated content and low quantity results would bring a huge value.

ShortN0te@lemmy.ml · 7 months ago

DDG is just a bing front-end.

That is wrong. Yes there are licensing the bing search database but it is not the only one they use. They have their own crawler too.

source