Retain duplicates in the given list using Stream API

60
June 18, 2022, at 11:20 AM

I want to find the duplicates in a set of identifier objects.

Example:

  1. NE , SN1
  2. NE , SN
  3. NE1 , SN
  4. NE , SN1

The result should be NE, SN1 in the List.

class Identifier {
 String reference;//NE
 String number;//SN1
 String digiId; //Not used for duplicate identification 
//getters and setters
}
public static List<Identifier> getDuplicates(List<Identifier> list) {
        Map<String, Map<String, Long>> noOfDupliactes = list.stream().
            collect(Collectors.groupingBy(Identifier::getReference,Collectors.groupingBy(Identifier::getNumber, Collectors.counting())));
**//How to process noOfDupliactes in Java8 using stream to get the List<Identifier>?**
}

Any help would be appreciated.

Answer 1

I assume that equals/hashCode of the Identifier doesn't match your requirements and cant be changed.

From your example, I understood that you want to get one representative for each duplicate combination of number and reference.

The first approach utilizes streams. Firstly it creates a map, filters out buckets with a single value, peeks one element from every bucket with duplicates, and collects to List.

    public static List<Identifier> getDuplicates(List<Identifier> list) {
        return list.stream()
                .collect(Collectors.groupingBy(IdWrapper::new))
                .values().stream()
                .filter(identifiers -> identifiers.size() > 1)
                .map(identifiers -> identifiers.stream().findAny().orElseThrow())
                .toList();
    }

For this solution record IdWrapper can be implemented like this:

    record IdWrapper(String reference, String number) {
        IdWrapper(final Identifier identifier) {
            this(identifier.reference, identifier.number);
        }
    }

The second approach requires a single traversal over the given list.

Every element of the list is checked against the set, if an element gets rejected by the set that means it's duplicate.

    public static List<Identifier> getDuplicates(List<Identifier> list) {
        Set<IdWrapper> wrappers = new HashSet<>();
        Set<IdWrapper> duplicates = new HashSet<>();
        for (Identifier id: list) {
            IdWrapper wrapper = new IdWrapper(id);
            if (!wrappers.add(wrapper)) {
                duplicates.add(wrapper);
            }
        }
        return unwrap(duplicates);
    }
    private static List<Identifier> unwrap(Collection<IdWrapper> duplicates) {
        return duplicates.stream()
                .map(IdWrapper::getId)
                .toList(); // for Java 16 and above == collect(Collectors.toList())
    }

Record IdWrapper (could be implemented as a class if you are using Java 15 or earlier) is used to establish uniqueness (i.e. provide equals and hashCode) based on the number and reference of the Identifier object.

    record IdWrapper(Identifier id) {
        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;
            IdWrapper other = (IdWrapper) o;
            return this.id.reference.equals(other.id.reference) && this.id.number.equals(other.id.number);
        }
        @Override
        public int hashCode() {
            return Objects.hash(this.id.reference, this.id.number);
        }
    }
    public static void main(String[] args) {
        List<Identifier> ids = List.of(
                new Identifier("NE" , "SN1"),
                new Identifier("NE" , "SN"),
                new Identifier("NE1" , "SN"),
                new Identifier("NE" , "SN1")
        );
        System.out.println(getDuplicates(ids));
    }

output (identical for both versions of getDuplicates())

    [Identifier{reference='NE', number='SN1'}]
Answer 2

If I understood your expected output correctly, that should work:

public static List<Identifier> getDuplicates(List<Identifier> list) {
    return list.stream().collect(Collectors.groupingBy(Function.identity()))
            .entrySet()
            .stream()
            .filter(e -> e.getValue().size() > 1)
            .map(Map.Entry::getKey)
            .collect(Collectors.toList());
}

This is just one of the standard methods to get a list of duplicates in a list.

Just make sure to add proper .equals and .hashCode methods in your code, for example similar to this (but with your getters)

@Override
public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || getClass() != o.getClass()) return false;
    Identifier that = (Identifier) o;
    return Objects.equals(reference, that.reference) && Objects.equals(number, that.number);
}
@Override
public int hashCode() {
    return Objects.hash(reference, number);
}
Rent Charter Buses Company
READ ALSO
Client does not interact with the server java (Socket)-

Client does not interact with the server java (Socket)-

Is a client application that should create an interface and connect to the server which then connects to db to run queriesI can't understand why the client doesn't connect to the server

57
QESeal LTV validation

QESeal LTV validation

I had issues some months ago with signed PDFs and LTVAcrobat Reader showed that the signature is LTV enabled and after some months it started to say otherwise, the signature is not LTV enabled and will expire in around 1 year after the signing

42
Java Streams: Find first for multiple filter predicates

Java Streams: Find first for multiple filter predicates

I have a Collection<Product> and a Collection<Predicate<Product>>The Predicates are a simple combination of boolean flags

29
what is the impact on Kafka topic partitions log when consumed using multiple Consumer groups?

what is the impact on Kafka topic partitions log when consumed using multiple Consumer groups?

My question is more related to performance rather than how they consume data in consumer groupsWe know that kafka create single PARTITION LOG on filesystem, which is accessed by all consumer group's consumer on that partition

48