I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).

With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.

Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

  • vorb0te@lemmynsfw.com
    link
    fedilink
    arrow-up
    0
    ·
    7 days ago

    He could also refer to the mere possibility of having duplicates which does not mean there are duplicates. And even then it could be by accident. Of course db design could prevent this. But I guess he is inflating the importance of this issue.

  • nednobbins@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    7 days ago

    It’s so basic that documentation is completely unnecessary.

    “De-duping” could mean multiple things, depending on what you mean by “duplicate”.

    It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, do “de-duping” wouldn’t remove it.

    It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.

    A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.

      • nednobbins@lemm.ee
        link
        fedilink
        arrow-up
        0
        ·
        6 days ago

        Yeah. And the fix for that has nothing to do with “de-duping” as a database operation either.

        The main components would probably be:

        1. Decide on a new scheme (with more digits)
        2. Create a mapping from the old scheme to the new scheme. (that’s where existing duplicates would get removed)
        3. Let people use both during some transition period, after which the old one isn’t valid any more.
        4. Decide when you’re going to stop issuing old SSNs and only issue new ones to people born after some date.

        There’s a lot of complication in each of those steps but none of them are particularly dependant on “de-duped” databases.

      • DacoTaco@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        7 days ago

        Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
        In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80’s.
        Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80’s :')

        • Wispy2891@lemmy.world
          link
          fedilink
          arrow-up
          0
          ·
          7 days ago

          In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the “city code” is the same for the whole foreign country, so it’s not impossible that there are two Ananya Gupta born on the same day in the whole India)

          • DacoTaco@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            edit-2
            6 days ago

            Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha

  • KillingTimeItself@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    0
    ·
    7 days ago

    TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.

    You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)

    now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.

    The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

    Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,

    • valtia@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      7 days ago

      i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another

      Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

      what elon is implying here (remove “duplicate” entries, however that’s supposed to work)

      Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

      • KillingTimeItself@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        6 days ago

        Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.

        in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.

        Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.

        and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

        • valtia@lemmy.world
          link
          fedilink
          arrow-up
          0
          ·
          6 days ago

          in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.

          … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

          Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.

          … I don’t think you understand how modern databases are designed

          • KillingTimeItself@lemmy.dbzer0.com
            link
            fedilink
            English
            arrow-up
            0
            ·
            6 days ago

            … That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there

            u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.

            … I don’t think you understand how modern databases are designed

            it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.

            Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.

      • DacoTaco@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        7 days ago

        Ssn being unique isnt a dump idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.

        Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.

  • Garlicsquash@lemmings.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    7 days ago

    Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.

    • werefreeatlast@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      7 days ago

      Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That’s gonna be a crazy day.

      I hope they’re not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.

  • SolidShake@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    How come republicans keep saying that doggy is going to expose all the fraud in the government but yet the biggest fraud with 37 felonies is president? What the actual fuck to these people think?

  • skozzii@lemmy.ca
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    Musk is the walking Dunning-Krueger, he is too stupid to realize how terrible he sounds.

  • missingno@fedia.io
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.

    • credo@lemmy.world
      cake
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      8 days ago

      This explanation makes no sense in the context of OP’s question, given the order of comments…

      • finitebanjo@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        8 days ago

        Yeah, a better explanation is that Deduplicating Databases are an absolutely terrible idea for every use case, as it means deleting history from the database.

  • Geometrinen_Gepardi@sopuli.xyz
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    Rows in a SQL table have a primary key which works as the unique identifier for that row. The primary key can be as simple as an incrementing number.

      • knightly the Sneptaur@pawb.social
        link
        fedilink
        arrow-up
        0
        ·
        edit-2
        8 days ago

        Not unless the data associated with that SSN is itself inconsistent.

        For example, when multiple people are fraudulently using the same SSN, the fraud monitoring DB would neccessarily need to record several entries with the same SSN.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          0
          ·
          8 days ago

          Ah the old “malware detectors have the selectors for malware and so they show up as malware to other malware detection systems” problem.

          Yeah, that seems like a reasonable case to have duplicate SSNs.

  • Nate Cox@programming.dev
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    Because a simple query would have shown that SSN was a compound key with another column (birth date, I think), and not the identifier he thinks it is.

    • BombOmOm@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      8 days ago

      Why would one person, one SSN ever have two different birth dates? That sounds like an issue all onto itself.

      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        0
        ·
        8 days ago

        A weak example would be my grandma. She was born before social security and was told as a kid she was born in 1938. Because I guess in the olden days, you just didn’t need to pass your birth certificate around for anything, it wasn’t until she went to get married at ~age 25 that her birth certificate actually said she was born in 1940 (I forget the actual years, but I remember it was a two year and two day gap between dates).

        Its a weak example that should apply to only a microscopic portion of the population, but I could see her having some weird records in the databases as a result.

      • geoff@lemm.ee
        link
        fedilink
        arrow-up
        0
        ·
        8 days ago

        I think what he means is that the unique identifier for a database record is a composite of two fields: SSN + birth date. That doesn’t mean that SSN to birth date is a one-to-many relation.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          0
          ·
          8 days ago

          But they are implying SSN to SSN+Birthdate is a one-to-many relationship. Since SSN to SSN should be one-to-one, you can conclude the SSN to Birthdate is one-to-many, right?

          • Nate Cox@programming.dev
            link
            fedilink
            English
            arrow-up
            0
            ·
            8 days ago

            No, who said there was a relationship?

            A compound key is a composite key where one or both sides can be foreign keys to other tables themselves; it’s a safe assumption this is probably true in a large data set like social security. A composite key is a candidate key (a uniquely identified key) made up of more than one column.

            This basically means that there is a finite number of available SSNs because they’re only 10 digits long and someone intends to recycle SSNs after the current user of one dies. Linking it to birthday is “unique enough” as to never recur.

            • DahGangalang@infosec.pubOP
              link
              fedilink
              arrow-up
              0
              ·
              8 days ago

              I think I was getting some wires crossed and/or misunderstood what geoff (parent commentor to my last comment) was saying, so my comment may be misdirected some.

              But according to The Social Security FAQ page, SSNs are not recycled, so that data (especially when compounded and hashed with other data) should be able to establish a one-to-one relationship between each primary key and an SSN, thusly having SSNs appear associated with multiple primary keys is a concern.

              Other comments have pointed to other explanations for why SSNs could appear to occur multiple times, but those amount to “it appeared in a different field associated with the same primary key”. I think thats the most likely explanation of things.

              • jj4211@lemmy.world
                link
                fedilink
                arrow-up
                0
                ·
                edit-2
                8 days ago

                Note that it being only part of a key is a technology choice that does not require the reality map to it. It may seem like overkill, but someone may not trust the political process to preserve that promise and so they add the birthdate, just in case something goes sideway in the future. Lots of technical choices are made anticipating likely changes and problems and designing things to be extra robust in the face of those

  • jacksilver@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    If SSNs are used as a primary key (a unique identifier for a row of data) then they’d have to be duplicated to be able to merge data together.

    However, even if they aren’t using ssn as an identifier as it’s sensitive information. It’s not uncommon to repeat data either for speed/performance sake, simplicity in table design, it’s in a lookup table, or you have disconnected tables.

    Having a value repeated doesn’t tell you anything about fraud risk, efficency, or really anything. Using it as the primary piece of evidence for a claim isn’t a strong arguement.

    • credo@lemmy.world
      cake
      link
      fedilink
      arrow-up
      0
      ·
      8 days ago

      This is the answer… it seems few on lemmy have ever normalized a database. But they do know how to give answers!

      • jacksilver@lemmy.world
        link
        fedilink
        arrow-up
        0
        ·
        8 days ago

        Thanks, OP seemed more curious about the technical aspects than just the absurdity of the comment (since pretty much every business uses SQL) so hoped a more technical explanation might be appreciated.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      0
      ·
      8 days ago

      This sounds like a reasonable argument.

      Can you pass any resources with examples on when having duplicate values would be useful/best practices?

  • 9point6@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    8 days ago

    The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

    The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

    If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.

    The man continues to be a malignant moron

    • snooggums@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      8 days ago

      The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

      Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.

      Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.

      https://www.ssa.gov/history/hfaq.html

      Q20: Are Social Security numbers reused after a person dies?

      A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.

      • halcyonloon@midwest.social
        link
        fedilink
        English
        arrow-up
        0
        ·
        8 days ago

        Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.

        In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          8 days ago

          It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.

          Hell, I work in a state agency and one of our older databases has a dozen tables with databases.

          • One has the whole thing as a long int: 222333444
          • One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
          • One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
          • One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444

          The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.

          • pixxelkick@lemmy.world
            link
            fedilink
            arrow-up
            0
            ·
            8 days ago

            Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.

            SSNs should be stored as strings without any hyphen or additional markup, nothing else.

            • Storing as a number can cause issues if you ever wanna support trailing zeros
            • any “styling” like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times

            It’s more likely though it’s just a composite key…

            • snooggums@lemmy.world
              link
              fedilink
              English
              arrow-up
              0
              ·
              8 days ago

              This is not what he is actively doing though. He isn’t trying to improve databases.

              He is tearing down entire departments and agencies and using shit like this to justify it.

              • pixxelkick@lemmy.world
                link
                fedilink
                arrow-up
                0
                ·
                edit-2
                8 days ago

                Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.

                Denormalization due to multiple different tables storing their own copies of the same data, in different formats worse yet, would actually be the kind of problem he’s tweeting about.

                As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.

                • snooggums@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  edit-2
                  8 days ago

                  Musk canceled the support for the long running Common Education Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.

                  It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.

        • Ephera@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          8 days ago

          The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
          But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          0
          ·
          edit-2
          8 days ago

          A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.

          Of all the comments ao far, I find yours the most compelling.

          • Barbarian@sh.itjust.works
            link
            fedilink
            arrow-up
            0
            ·
            edit-2
            8 days ago

            Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:

            SSN_Table

            ID | SSN | Other info

            Other_Table

            ID | SSN_ID | Other info

            When you want to connect them to have both sets of info, it’d be the following:

            SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID

            EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it’d by SSN_Table.SSN

            • DahGangalang@infosec.pubOP
              link
              fedilink
              arrow-up
              0
              ·
              8 days ago

              Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.

              But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.

              • Barbarian@sh.itjust.works
                link
                fedilink
                arrow-up
                0
                ·
                edit-2
                8 days ago

                It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:

                SSN_Table

                ID | SSN | Other info

                Driver_License_Table

                ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info

                Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.

                • Arcka@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  7 days ago

                  I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.

            • schteph@lemmy.world
              link
              fedilink
              arrow-up
              0
              ·
              8 days ago

              This is true, but there are many instances where denormalization makes sense and is frequently used.

              A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.

              There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.

              • Barbarian@sh.itjust.works
                link
                fedilink
                arrow-up
                0
                ·
                8 days ago

                Yeah, I work daily with a database with a very important non-ID field that is denormalized throughout most of the database. It’s not a common design pattern, but it is done from time to time.

              • esa@discuss.tchncs.de
                link
                fedilink
                arrow-up
                0
                ·
                8 days ago

                To be a bit more generic here, when you’re at government scale you’re generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.

                E.g. caching is duplicating data to save time. Without it we’d have lower storage costs, but longer wait times and more network traffic.

              • DahGangalang@infosec.pubOP
                link
                fedilink
                arrow-up
                0
                ·
                8 days ago

                Yeah, no one appreciates security.

                I probably overused that saying to explain it: ‘if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?’

      • DahGangalang@infosec.pubOP
        link
        fedilink
        arrow-up
        0
        ·
        8 days ago

        Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!

  • snooggums@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    If he doesn’t think the government uses sql after having his goons break into multiple government servers he is an idiot.

    If he is lying to cover his ass for fucking up so many things (the more likely explanation) then saying “he never used sql” is basically a dig at how technically inept he really is despite bragging about being a tech bro.

  • turtle [he/him]@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    8 days ago

    I saw a comment about this in the last couple of days that was really interesting and educational. Unfortunately I can’t seem to find it again to link it, but the gist of it was that there would be two things wrong with using SSNs as primary keys in a SQL database:

    • You should not use externally generated data as primary keys
    • You should not use personally identifying data as primary keys

    Using SSNs as keys would violate both.

    I went looking for best practices regarding SQL primary keys and found this really interesting post and discussion on Stack Overflow:

    https://stackoverflow.com/questions/337503/whats-the-best-practice-for-primary-keys-in-tables

    My first thought was that people’s SSNs can and do change, and sometimes (rarely?) people may have more than one SSN. Like someone mentions in that link, human error would be another reason why you would not want to use external data and particularly SSNs as primary keys.

    • DahGangalang@infosec.pubOP
      link
      fedilink
      arrow-up
      0
      ·
      8 days ago

      From what I’m seeing in other comments, it seems SSNs aren’t used as primary keys, but they are part of generating the primary key. I haven’t seen anyone directly say it, but it sounds like the primary key is a hash of SSN + DOB (I hope with more data to add entropy, because thats still a tiny bit of data to build a rainbow table from).

      Still, assuming we haven’t begun re-using SSNs, it seems concerning to me that a SSN is appearing multiple times in the database. It seems a safe assumption that the uniqueness of a SSN should make the resultant hash unique, so a SSN appearing as associated to multiple primary keys should be a concern, right?

      Other comments have led me to believe the “duplicate SSNs” are probably appearing in “different fields” (e.g. a dead man’s SSN would appear directly associated to him, but also as a sort of “collecting payments from” entry in his living wife’s entry). That would a misrepresentation of the facts (which we know Vice Bro, Elon Musk the Wise and Honest would never do). Occam’s Razor though has me leaning in that direction.

      • turtle [he/him]@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        8 days ago

        That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?

        I can imagine an SSN existing in more than one primary key due to errors. If they use SSNs in the primary key at all, but combined with something else, that leads me to believe that the designers felt that SSNs were reliable for being a pure primary key.

        I agree with you about Occam’s Razor. The guy has demonstrated multiple times that he’s a dishonest moron.

        • DahGangalang@infosec.pubOP
          link
          fedilink
          arrow-up
          0
          ·
          8 days ago

          I’m not familiar with cases where someone’s SSN could change. Could you link to resources on when that would happen?

          • turtle [he/him]@lemm.ee
            link
            fedilink
            English
            arrow-up
            0
            ·
            8 days ago

            I don’t have any resources handy, but I do know someone who this happened to: they were an immigrant who got an SSN the first time they migrated to the US, went back to live in their country for a number of years, then returned to the US and I guess applied for an SSN again. Voilá, two SSNs and a mess.

            • DahGangalang@infosec.pubOP
              link
              fedilink
              arrow-up
              0
              ·
              8 days ago

              Yeah, I can imagine thats be an administrative headache. I do not envy them the opportunity of sorting that out.

              Thanks for the example though. That makes sense.

        • snooggums@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          8 days ago

          That all makes sense, except if someone’s SSN changes (which happens under certain circumstances), doesn’t that invalidate their primary key or require a much more complicated operation of issuing a new record and relinking all the existing relationships?

          Yes, in the case of duplicate SSN assignments for two people (rare) l you would need to change their records to align with the new SSN while not changing the records that go the the person who keeps the SSN. We do it with state identifiers and it is a gigantic pain in the ass.

          If two numbers are assigned to the same person merging them to one of the two is far easier.

      • jballs@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        8 days ago

        I think the thing that’s catching you up the most is that you’re assuming Elon has the slightest clue what he’s talking about about. In your mind, you’ve read the words “the social security database” from his post and have made assumptions about what that means.

        I’ve worked with databases for 20+ years, several of those being years working on federal government systems. Each agency has dozens or possibly hundreds of databases all used for different purposes. Saying “the social security database” is so fucking general that it’s basically nonsensical. It’d be like saying “Ford’s car database”.

        Elon clearly heard someone technical talking about something, then misinterpreted it for his own purposes to justify what he is doing by destroying our government institutions. His follow up of saying the government doesn’t use SQL just reinforces that point.

        Trying to logically backtrack into what he actually meant - and what the primary keys should be - is just sane washing an insane statement.